Case Study 1: How Does a Bike-Share Navigate Speedy Success
In the previous Project i worked on how people use different type of the bikes according to day,month vise and how much time they are spent on each ride but in this project i have worked on how male and female,different age group use bike and how much time they spent on rides.In order to answer the key business questions, I have followed the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. Oneapproach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
Key Task
I will use Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license.I Will choose to work with Quarterly data of 2019.This is public data that I Will use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit me from using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
Key tasks
library(ggplot2)
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(readr)
Divvy_Trips_2019_Q1 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q1/Divvy_Trips_2019_Q1.csv")
Divvy_Trips_2019_Q2 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q2/Divvy_Trips_2019_Q2.csv")
Divvy_Trips_2019_Q3 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q3/Divvy_Trips_2019_Q3.csv")
Divvy_Trips_2019_Q4 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q4/Divvy_Trips_2019_Q4.csv")
colnames(Divvy_Trips_2019_Q2)<- colnames(Divvy_Trips_2019_Q1)
Combine_data <-rbind(Divvy_Trips_2019_Q1,Divvy_Trips_2019_Q2, Divvy_Trips_2019_Q3,Divvy_Trips_2019_Q4)
glimpse(Combine_data)
## Rows: 3,818,004
## Columns: 12
## $ trip_id <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 21…
## $ start_time <dttm> 2019-01-01 00:04:37, 2019-01-01 00:08:13, 2019-01-0…
## $ end_time <dttm> 2019-01-01 00:11:07, 2019-01-01 00:15:34, 2019-01-0…
## $ bikeid <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205,…
## $ tripduration <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, …
## $ from_station_id <dbl> 199, 44, 15, 123, 173, 98, 98, 211, 150, 268, 299, 2…
## $ from_station_name <chr> "Wabash Ave & Grand Ave", "State St & Randolph St", …
## $ to_station_id <dbl> 84, 624, 644, 176, 35, 49, 49, 142, 148, 141, 295, 4…
## $ to_station_name <chr> "Milwaukee Ave & Grand Ave", "Dearborn St & Van Bure…
## $ usertype <chr> "Subscriber", "Subscriber", "Subscriber", "Subscribe…
## $ gender <chr> "Male", "Female", "Female", "Male", "Male", "Female"…
## $ birthyear <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995…
str(Combine_data)
## spc_tbl_ [3,818,004 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ trip_id : num [1:3818004] 21742443 21742444 21742445 21742446 21742447 ...
## $ start_time : POSIXct[1:3818004], format: "2019-01-01 00:04:37" "2019-01-01 00:08:13" ...
## $ end_time : POSIXct[1:3818004], format: "2019-01-01 00:11:07" "2019-01-01 00:15:34" ...
## $ bikeid : num [1:3818004] 2167 4386 1524 252 1170 ...
## $ tripduration : num [1:3818004] 390 441 829 1783 364 ...
## $ from_station_id : num [1:3818004] 199 44 15 123 173 98 98 211 150 268 ...
## $ from_station_name: chr [1:3818004] "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ to_station_id : num [1:3818004] 84 624 644 176 35 49 49 142 148 141 ...
## $ to_station_name : chr [1:3818004] "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ usertype : chr [1:3818004] "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
## $ gender : chr [1:3818004] "Male" "Female" "Female" "Male" ...
## $ birthyear : num [1:3818004] 1989 1990 1994 1993 1994 ...
## - attr(*, "spec")=
## .. cols(
## .. trip_id = col_double(),
## .. start_time = col_datetime(format = ""),
## .. end_time = col_datetime(format = ""),
## .. bikeid = col_double(),
## .. tripduration = col_number(),
## .. from_station_id = col_double(),
## .. from_station_name = col_character(),
## .. to_station_id = col_double(),
## .. to_station_name = col_character(),
## .. usertype = col_character(),
## .. gender = col_character(),
## .. birthyear = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
Key tasks
Combine_data_01<-Combine_data %>%
select(trip_id,bikeid,tripduration,usertype,gender,birthyear)
colnames(Combine_data_01)
## [1] "trip_id" "bikeid" "tripduration" "usertype" "gender"
## [6] "birthyear"
Combine_data_02<-Combine_data_01 %>%
mutate(current_year= 2019)
glimpse(Combine_data_02)
## Rows: 3,818,004
## Columns: 7
## $ trip_id <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 2174244…
## $ bikeid <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205, 3939…
## $ tripduration <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, 886, …
## $ usertype <chr> "Subscriber", "Subscriber", "Subscriber", "Subscriber", "…
## $ gender <chr> "Male", "Female", "Female", "Male", "Male", "Female", "Ma…
## $ birthyear <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995, 199…
## $ current_year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 201…
sum(duplicated(Combine_data_02))
## [1] 0
sum(is.na(Combine_data_02))
## [1] 1097957
Combine_data_02_Na<-drop_na(Combine_data_02)
sum(is.na(Combine_data_02_Na))
## [1] 0
Combine_data_02_Na_F<- Combine_data_02_Na %>%
mutate(year_of_birth=current_year - birthyear )
glimpse(Combine_data_02_Na_F)
## Rows: 3,258,796
## Columns: 8
## $ trip_id <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 217424…
## $ bikeid <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205, 393…
## $ tripduration <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, 886,…
## $ usertype <chr> "Subscriber", "Subscriber", "Subscriber", "Subscriber", …
## $ gender <chr> "Male", "Female", "Female", "Male", "Male", "Female", "M…
## $ birthyear <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995, 19…
## $ current_year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
## $ year_of_birth <dbl> 30, 29, 25, 26, 25, 36, 35, 29, 24, 23, 25, 25, 33, 29, …
Key tasks
Combine_data_02_Na_F %>%
distinct(year_of_birth) %>%
arrange(year_of_birth)
## # A tibble: 89 × 1
## year_of_birth
## <dbl>
## 1 5
## 2 16
## 3 17
## 4 18
## 5 19
## 6 20
## 7 21
## 8 22
## 9 23
## 10 24
## # ℹ 79 more rows
Combine_data_02_Na_F %>%
filter(year_of_birth == 5)
## # A tibble: 5 × 8
## trip_id bikeid tripduration usertype gender birthyear current_year
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 22463474 6225 7209 Subscriber Female 2014 2019
## 2 22483110 6391 4515 Subscriber Female 2014 2019
## 3 22634065 2076 8469 Subscriber Female 2014 2019
## 4 22670749 2076 175251 Subscriber Female 2014 2019
## 5 22895143 2334 2479420 Subscriber Female 2014 2019
## # ℹ 1 more variable: year_of_birth <dbl>
Combine_data_02_Na_F_1<- Combine_data_02_Na_F %>%
filter(year_of_birth >= 16)
Combine_data_02_Na_F_1 %>%
distinct(year_of_birth) %>%
arrange(year_of_birth)
## # A tibble: 88 × 1
## year_of_birth
## <dbl>
## 1 16
## 2 17
## 3 18
## 4 19
## 5 20
## 6 21
## 7 22
## 8 23
## 9 24
## 10 25
## # ℹ 78 more rows
Combine_data_02_Na_F_1 %>%
group_by(gender) %>%
summarise(min_trip= min(tripduration),max_trip = max(tripduration),avg_trip=mean(tripduration))
## # A tibble: 2 × 4
## gender min_trip max_trip avg_trip
## <chr> <dbl> <dbl> <dbl>
## 1 Female 61 8203637 1301.
## 2 Male 61 9056633 987.
Combine_data_Final<-Combine_data_02_Na_F_1 %>%
mutate(age_cat = case_when(year_of_birth >= 16 & year_of_birth <= 30 ~ "16-30",
year_of_birth >= 31 & year_of_birth <= 50 ~ "31-50",
year_of_birth >= 51 & year_of_birth <= 70 ~ "51-70",
year_of_birth >= 71 & year_of_birth <= 90 ~ "71-90",
year_of_birth >= 91 & year_of_birth <=119 ~ "94-119"))
glimpse(Combine_data_Final)
## Rows: 3,258,791
## Columns: 9
## $ trip_id <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 217424…
## $ bikeid <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205, 393…
## $ tripduration <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, 886,…
## $ usertype <chr> "Subscriber", "Subscriber", "Subscriber", "Subscriber", …
## $ gender <chr> "Male", "Female", "Female", "Male", "Male", "Female", "M…
## $ birthyear <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995, 19…
## $ current_year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
## $ year_of_birth <dbl> 30, 29, 25, 26, 25, 36, 35, 29, 24, 23, 25, 25, 33, 29, …
## $ age_cat <chr> "16-30", "16-30", "16-30", "16-30", "16-30", "31-50", "3…
sum(is.na(Combine_data_Final))
## [1] 136
Combine_data_Final<-drop_na(Combine_data_Final)
Combine_data_Final %>%
group_by(age_cat) %>%
summarise(count= n())
## # A tibble: 5 × 2
## age_cat count
## <chr> <int>
## 1 16-30 1463681
## 2 31-50 1414519
## 3 51-70 372814
## 4 71-90 6789
## 5 94-119 852
view(Combine_data_Final)
Recommendations
We should target people who is come under the age category of 30 to 51 and run the different online campaign . we should also use survey if possible to know what they want so that we can convert casual member to annual members.
we should also more focused on the female side also because we can increase the count and convert them into annual membership.we should also run the online survey to know the preference.we can also include the digital marketing and promotion.