Google Data Analytics Capstone Project-Cyclistic

Case Study 1: How Does a Bike-Share Navigate Speedy Success

Introduction

Welcome to the Cyclistic bike-share analysis case study! In this case study, I will perform many real-world tasks of a junior data analyst. I will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. Oneapproach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Ask

Moreno the director of marketing has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

Key Task

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Prepare

I will use Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license.I Will choose to work with an entire year of data from April-2020 to Mar-2021.This is public data that I Will use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit me from using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Key tasks

Download data and store it appropriately.
Identify how it’s organized.
Sort and filter the data.
Determine the credibility of the data.

Installing the Required Packages

#Installing the Required Packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(skimr)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(ggplot2)
library(readr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ tibble  3.2.1
## ✔ purrr   1.0.1     ✔ tidyr   1.3.0
## ✔ stringr 1.5.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Downloading and Importing the data

April2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202104-divvy-tripdata.csv")
May2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202105-divvy-tripdata.csv")
Jun2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202106-divvy-tripdata.csv")
July2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202107-divvy-tripdata.csv")
Aug2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202108-divvy-tripdata.csv")
Sept2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202109-divvy-tripdata.csv")
Oct2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202110-divvy-tripdata.csv")
Nov2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202111-divvy-tripdata.csv")
Dec2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202112-divvy-tripdata.csv")
Jan2022 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202201-divvy-tripdata.csv")
Feb2022 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202202-divvy-tripdata.csv")
Mar2022 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202203-divvy-tripdata.csv")

Combine the All data into one file

Master_Trip_Data<-rbind(April2021,May2021,Jun2021,July2021,Aug2021,Sept2021,Oct2021,Nov2021,
                        Dec2021,Jan2022,Feb2022,Mar2022)

Process

Key tasks

Check the data for errors.
Choose your tools.
Transform the data so you can work with it effectively.
Document the cleaning process.

Using glimpse and Colname function to check the data

glimpse(Master_Trip_Data)

## Rows: 5,723,532
## Columns: 13
## $ ride_id            <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type      <chr> "classic_bike", "docked_bike", "docked_bike", "clas…
## $ started_at         <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-…
## $ ended_at           <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id   <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name   <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id     <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat          <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng          <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat            <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng            <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual      <chr> "member", "casual", "casual", "member", "casual", "…

colnames(Master_Trip_Data)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

Filtering the column and removed unusable column

Master_Trip_Data_01<- Master_Trip_Data %>% 
  select(ride_id,rideable_type,started_at,ended_at,member_casual)

colnames(Master_Trip_Data_01)

## [1] "ride_id"       "rideable_type" "started_at"    "ended_at"     
## [5] "member_casual"

Rename the columm name

Master_Trip_Data_01_Rename<-Master_Trip_Data_01 %>% 
  rename(bike_type = rideable_type,customer_type=member_casual)

colnames(Master_Trip_Data_01_Rename)

## [1] "ride_id"       "bike_type"     "started_at"    "ended_at"     
## [5] "customer_type"

Checking the duplicate and Na value

sum(is.na(Master_Trip_Data_01_Rename))

## [1] 0

sum(duplicated(Master_Trip_Data_01_Rename$ride_id))

## [1] 0

Convert the started at and ended at character format to POSIXct to know the ride length

Master_Trip_Data_01_Rename$started_at<-as.POSIXct(Master_Trip_Data_01_Rename$started_at,format="%Y-%m-%d %H:%M:%S")
Master_Trip_Data_01_Rename$ended_at<-as.POSIXct(Master_Trip_Data_01_Rename$ended_at,format="%Y-%m-%d %H:%M:%S")

Extracting date,day,month year and day of week

Master_Trip_Data_01_Rename$date<-as.Date(Master_Trip_Data_01_Rename$started_at)
Master_Trip_Data_01_Rename$day<-format(as.Date(Master_Trip_Data_01_Rename$date),"%d")
Master_Trip_Data_01_Rename$Month<-format(as.Date(Master_Trip_Data_01_Rename$date),"%m")
Master_Trip_Data_01_Rename$Year<-format(as.Date(Master_Trip_Data_01_Rename$date),"%Y")
Master_Trip_Data_01_Rename$day_of_week<-format(as.Date(Master_Trip_Data_01_Rename$date),"%A")

Calculating the ride length

Master_Trip_Data_01_Rename$ride_length<-difftime(Master_Trip_Data_01_Rename$ended_at,Master_Trip_Data_01_Rename$started_at)

glimpse(Master_Trip_Data_01_Rename)

## Rows: 5,723,532
## Columns: 11
## $ ride_id       <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508A80BA…
## $ bike_type     <chr> "classic_bike", "docked_bike", "docked_bike", "classic_b…
## $ started_at    <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-03 12…
## $ ended_at      <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-07 11…
## $ customer_type <chr> "member", "casual", "casual", "member", "casual", "casua…
## $ date          <date> 2021-04-12, 2021-04-27, 2021-04-03, 2021-04-17, 2021-04…
## $ day           <chr> "12", "27", "03", "17", "03", "25", "03", "06", "12", "2…
## $ Month         <chr> "04", "04", "04", "04", "04", "04", "04", "04", "04", "0…
## $ Year          <chr> "2021", "2021", "2021", "2021", "2021", "2021", "2021", …
## $ day_of_week   <chr> "Monday", "Tuesday", "Saturday", "Saturday", "Saturday",…
## $ ride_length   <drtn> 1879 secs, 3858 secs, 341859 secs, 1506 secs, 5477 secs…

Converting the ride_length column from factor to numeric

Master_Trip_Data_01_Rename$ride_length<-as.numeric(Master_Trip_Data_01_Rename$ride_length)

Filtering and removing the negative value from the ride length column

Master_Trip_Data_01_Rename_Arr<-Master_Trip_Data_01_Rename %>% 
  filter(ride_length>=1)

Using arrange function to check whether negtive value removed from dataset or not

Master_Trip_Data_01_Rename_Arr_1<-arrange(Master_Trip_Data_01_Rename_Arr,ride_length)

glimpse(Master_Trip_Data_01_Rename_Arr_1)

## Rows: 5,722,873
## Columns: 11
## $ ride_id       <chr> "3F99442B76EC2051", "2DC9DF08B3526631", "08F12FCBFCB2E2A…
## $ bike_type     <chr> "classic_bike", "classic_bike", "classic_bike", "electri…
## $ started_at    <dttm> 2021-04-16 18:18:00, 2021-04-16 07:58:39, 2021-04-04 23…
## $ ended_at      <dttm> 2021-04-16 18:18:01, 2021-04-16 07:58:40, 2021-04-04 23…
## $ customer_type <chr> "member", "member", "casual", "casual", "member", "membe…
## $ date          <date> 2021-04-16, 2021-04-16, 2021-04-04, 2021-04-07, 2021-04…
## $ day           <chr> "16", "16", "04", "07", "17", "18", "03", "18", "07", "0…
## $ Month         <chr> "04", "04", "04", "04", "04", "04", "04", "04", "04", "0…
## $ Year          <chr> "2021", "2021", "2021", "2021", "2021", "2021", "2021", …
## $ day_of_week   <chr> "Friday", "Friday", "Sunday", "Wednesday", "Saturday", "…
## $ ride_length   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

Giving name to the final data

Final_file<-Master_Trip_Data_01_Rename_Arr_1

Analyze

Key tasks

Aggregate your data so it’s useful and accessible.
Organize and format your data.
Perform calculations.
Identify trends and relationships.

Calculate the average ride_length for members and casual riders

Final_file %>% 
  group_by(customer_type) %>% 
  summarise(Avg_ride_lenght=mean(ride_length))

## # A tibble: 2 × 2
##   customer_type Avg_ride_lenght
##   <chr>                   <dbl>
## 1 casual                  1905.
## 2 member                   802.

Calculate the average ride_length for users by day_of_week

Final_file %>% 
  group_by(day_of_week,customer_type) %>% 
  summarise(Avg_ride_length = mean(ride_length))

## `summarise()` has grouped output by 'day_of_week'. You can override using the
## `.groups` argument.

## # A tibble: 14 × 3
## # Groups:   day_of_week [7]
##    day_of_week customer_type Avg_ride_length
##    <chr>       <chr>                   <dbl>
##  1 Friday      casual                  1806.
##  2 Friday      member                   788.
##  3 Monday      casual                  1889.
##  4 Monday      member                   778.
##  5 Saturday    casual                  2057.
##  6 Saturday    member                   900.
##  7 Sunday      casual                  2245.
##  8 Sunday      member                   921.
##  9 Thursday    casual                  1673.
## 10 Thursday    member                   754.
## 11 Tuesday     casual                  1646.
## 12 Tuesday     member                   751.
## 13 Wednesday   casual                  1666.
## 14 Wednesday   member                   755.

Calculate the number of rides for users by day_of_week

Final_file %>% 
  group_by(day_of_week,customer_type) %>% 
  summarise(Numbers_of_rides=n())

## `summarise()` has grouped output by 'day_of_week'. You can override using the
## `.groups` argument.

## # A tibble: 14 × 3
## # Groups:   day_of_week [7]
##    day_of_week customer_type Numbers_of_rides
##    <chr>       <chr>                    <int>
##  1 Friday      casual                  364237
##  2 Friday      member                  453072
##  3 Monday      casual                  292960
##  4 Monday      member                  439405
##  5 Saturday    casual                  549945
##  6 Saturday    member                  431302
##  7 Sunday      casual                  482746
##  8 Sunday      member                  387681
##  9 Thursday    casual                  293604
## 10 Thursday    member                  475298
## 11 Tuesday     casual                  276338
## 12 Tuesday     member                  490059
## 13 Wednesday   casual                  286364
## 14 Wednesday   member                  499862

Checking Minimum,Maximum,Average ride length by customer type

Final_file %>% 
  group_by(customer_type) %>% 
  summarise(Min_ride=min(ride_length),max_ride=max(ride_length),Avg_ride=mean(ride_length))

## # A tibble: 2 × 4
##   customer_type Min_ride max_ride Avg_ride
##   <chr>            <dbl>    <dbl>    <dbl>
## 1 casual               1  3356649    1905.
## 2 member               1    93594     802.

Key task

Determine the best way to share your findings.
Create effective data visualizations.
Present your findings.
Ensure your work is accessible.

In the visualize phase i will divide the data visualization according to the Customer type,bike,type and ride length.

Visualize the data by customer type

Total Customer count

Total_customer<-Final_file %>% 
  select(customer_type) %>% 
  count(customer_type)

ggplot(Total_customer,aes(x=customer_type,y=n,fill=customer_type))+
  geom_bar(stat = "identity")+
  geom_text(aes(label=n),vjust=-0.4)+theme_minimal()+
  labs(title = "Casuals and Members distribution",x="Casuals x Members",y="Total Counts")+
  scale_fill_brewer(palette = "Blues")

Types of the bikes used by differrent customer

ggplot(data = Final_file)+geom_bar(mapping = aes(x=customer_type,fill=bike_type))+
  facet_wrap(~bike_type)+theme_minimal()+
  labs(title = "Types of bikes used by differrent customer",
       x="Casuals x Members",y="Ride_length")+theme_minimal()+
  scale_fill_brewer(direction = -1)

Customer type according to day of week

Day_of_week <-Final_file %>% 
  group_by(customer_type,day_of_week) %>% 
  summarise(number_of_rides=n())

## `summarise()` has grouped output by 'customer_type'. You can override using the
## `.groups` argument.

ggplot(data = Day_of_week)+geom_col(mapping = aes(x=day_of_week,y=number_of_rides,fill=customer_type))+
  facet_wrap(~customer_type)+theme_minimal()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday", 
                            "Wednesday", "Thursday", "Friday", "Saturday"))+
  scale_fill_brewer(palette = "Paired")+
  labs(title = "Casual and Annual member on Weekdays")

Customer type Accroding to the month

Month<-Final_file %>% 
  group_by(customer_type,Month) %>% 
  summarise(number_of_rides=n())

## `summarise()` has grouped output by 'customer_type'. You can override using the
## `.groups` argument.

ggplot(data = Month)+geom_col(mapping = aes(x=Month,y=number_of_rides,fill=customer_type))+
  facet_wrap(~customer_type)+
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  scale_fill_brewer(direction = -1)+
  labs(title = "Casual and Annual member on Monthly")

Visualize the Data by Bike Type

Total count of bike type

Total_bikes<-Final_file %>%
  group_by(bike_type) %>% 
  summarise(number_of_rides=n())


ggplot(Total_bikes,aes(x=bike_type,y=number_of_rides, fill=bike_type))+
  geom_col()+
  theme_minimal()+
  labs(title = "Total counts Accroding to Bike_type",x="bike_type",y="Total Counts")+
  scale_fill_brewer(palette = "Blues",direction = -1)+
  geom_text(aes(label=number_of_rides),vjust= -0.2)

Bike Type according to the day of week

Bike_type_wd<- Final_file %>%
  group_by(bike_type,day_of_week) %>% 
  summarise(number_of_rides= n())

## `summarise()` has grouped output by 'bike_type'. You can override using the
## `.groups` argument.

ggplot(Bike_type_wd,aes(x=day_of_week, y=number_of_rides,fill=bike_type))+
  geom_col()+theme_minimal()+facet_wrap(~bike_type)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday", 
                            "Wednesday", "Thursday", "Friday", "Saturday"))+
  labs(title = "Types of bikes Used Accroding to the Day of Week")+
  scale_fill_brewer(direction = -1)

Bike Type according to the Month

 bike_type_mn<-Final_file %>% 
  group_by(bike_type,Month) %>% 
  summarise(number_of_rides=n())

## `summarise()` has grouped output by 'bike_type'. You can override using the
## `.groups` argument.

ggplot(bike_type_mn,aes(x=Month, y=number_of_rides,fill=bike_type))+
  geom_col()+theme_minimal()+facet_wrap(~bike_type)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  scale_fill_brewer(direction = -1)

3.Visualize the data by Ride length

Total time spent by each customers

Cus_ridelength<-Final_file %>% 
  group_by(customer_type) %>% 
  summarise(time_spent=sum(ride_length))


ggplot(Cus_ridelength,aes(x=customer_type,y=time_spent, fill=customer_type))+
  geom_col()+theme_minimal()+labs(title = "Total time spent by Casual and Annual Member",y="Time Spent")+
  scale_fill_brewer(palette = "Blues",direction = -1)

Total Time Spent by each Bikes

Bike_ridelength<-Final_file %>%
  group_by(bike_type) %>% 
  summarise(Time_Spent=sum(ride_length))
  
ggplot(Bike_ridelength,aes(x=bike_type,y=Time_Spent,fill=bike_type))+
  geom_col()+theme_minimal()+scale_fill_brewer(palette = "Blues",direction = -1)+
  labs(title = "Total Time Spent by each bikes")+
  scale_x_discrete(limits=c("classic_bike","electric_bike","docked_bike"))

Total Time Spent According to day of the week

Day<-Final_file %>% 
  group_by(day_of_week) %>% 
  summarise(Time_Spent=sum(ride_length))

ggplot(Day,aes(x=day_of_week,y=Time_Spent,fill=Time_Spent))+
  geom_col()+scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Saturday"))+theme_minimal()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Total Time Spent According to the day of Week")+
  scale_fill_distiller(direction = -1)

Total time Spent According to month

Month<-Final_file %>% 
  group_by(Month) %>% 
  summarise(Time_spent=sum(ride_length))

ggplot(Month,aes(x=Month,y=Time_spent,fill=Time_spent))+
  geom_col()+theme_minimal()+
  labs(title = "Total Time Spent According to the Month")

Findings

1.Customer type:- As per the finding the count of the annual member is more then casual members.Both Annual member and casual member preferred to use classic bike but the annual member used classic bike more than casual members which is followed by the electrical bike and docked bike.On weekends,casual member takes most of the rides compared to other weekdays.on the contrary, Tuesday Wednesday,Thursday are peak rides for annual members.On monthly basis, both casual and annual member are at peak in the mid of year such as May,Jun,July,Aug

2.Bike type :- Classic bike is used mostly if compared to the electric and docked bike.On weekdays, classic bike mostly used on Sundays and Saturday and for the electric bile all weekdays seems equal. On Monthly basis, june and july are the peak month for classic bike and Oct month seems busy for electrical bike.

3.Ride length :- Casual member spent most of the time as compared to the annual members.Classic bike travel mostly followed by electrical bike. on weekdays, Saturday and Sunday are the busiest one.on monthly basis, mid months are the busiest if compared to the other months

Conclusions

After checking the finding we can see that annual members mostly take the short trip but numbers of the rides is much more than the causal members. on the contrary, casual members take the longest ride in comparison to annual members.

Act

Recommendations

1.We should run the online campaign or use popular fitness model to tell the people,what is the benefit of using cycle on daily basis and how we can be healthy both physically and mentally by using cycle on daily routine.

2.We should give the special discounts or some months free if customer will purchase annual membership.we should also provide the fitness electronic gadgets along with annual membership is possible.

3.we should also tell the different uses of all three different bike so that customer can use the bike accordingly. For example we can use electrical for long distance and withing city we can use classic bike So that we can increase the annual membership of electrical bike.We also tell the difference to people about using electrical bike and using petroleum bike or cars and how much it is expensive and environment friendly.

Google Data Analytics Capstone Project-Cyclistic

Sukhvir Singh

2023-09-25