Case Study 1: How Does a Bike-Share Navigate Speedy Success

Introduction

In the previous Project i worked on how people use different type of the bikes according to day,month vise and how much time they are spent on each ride but in this project i have worked on how male and female,different age group use bike and how much time they spent on rides.In order to answer the key business questions, I have followed the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

About the company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. Oneapproach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Ask

Key Task

  1. Total Numbers of Male and Female who used bikes.
  2. Trip Duration According to Male and Female.
  3. Total Number of Different Age Group who used bikes.
  4. Trip Duration According to Different Age Group.

Prepare

I will use Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license.I Will choose to work with Quarterly data of 2019.This is public data that I Will use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit me from using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Key tasks

  1. Download data and store it appropriately.
  2. Identify how it’s organized.
  3. Sort and filter the data.
  4. Determine the credibility of the data.
library(ggplot2)
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(readr)
Divvy_Trips_2019_Q1 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q1/Divvy_Trips_2019_Q1.csv")
Divvy_Trips_2019_Q2 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q2/Divvy_Trips_2019_Q2.csv")
Divvy_Trips_2019_Q3 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q3/Divvy_Trips_2019_Q3.csv")
Divvy_Trips_2019_Q4 <- read_csv("C:/Users/SUKHVIR/Downloads/Divvy_Trips_2019_Q4/Divvy_Trips_2019_Q4.csv")
colnames(Divvy_Trips_2019_Q2)<- colnames(Divvy_Trips_2019_Q1)
Combine_data <-rbind(Divvy_Trips_2019_Q1,Divvy_Trips_2019_Q2, Divvy_Trips_2019_Q3,Divvy_Trips_2019_Q4)
glimpse(Combine_data)
## Rows: 3,818,004
## Columns: 12
## $ trip_id           <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 21…
## $ start_time        <dttm> 2019-01-01 00:04:37, 2019-01-01 00:08:13, 2019-01-0…
## $ end_time          <dttm> 2019-01-01 00:11:07, 2019-01-01 00:15:34, 2019-01-0…
## $ bikeid            <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205,…
## $ tripduration      <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, …
## $ from_station_id   <dbl> 199, 44, 15, 123, 173, 98, 98, 211, 150, 268, 299, 2…
## $ from_station_name <chr> "Wabash Ave & Grand Ave", "State St & Randolph St", …
## $ to_station_id     <dbl> 84, 624, 644, 176, 35, 49, 49, 142, 148, 141, 295, 4…
## $ to_station_name   <chr> "Milwaukee Ave & Grand Ave", "Dearborn St & Van Bure…
## $ usertype          <chr> "Subscriber", "Subscriber", "Subscriber", "Subscribe…
## $ gender            <chr> "Male", "Female", "Female", "Male", "Male", "Female"…
## $ birthyear         <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995…
str(Combine_data)
## spc_tbl_ [3,818,004 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ trip_id          : num [1:3818004] 21742443 21742444 21742445 21742446 21742447 ...
##  $ start_time       : POSIXct[1:3818004], format: "2019-01-01 00:04:37" "2019-01-01 00:08:13" ...
##  $ end_time         : POSIXct[1:3818004], format: "2019-01-01 00:11:07" "2019-01-01 00:15:34" ...
##  $ bikeid           : num [1:3818004] 2167 4386 1524 252 1170 ...
##  $ tripduration     : num [1:3818004] 390 441 829 1783 364 ...
##  $ from_station_id  : num [1:3818004] 199 44 15 123 173 98 98 211 150 268 ...
##  $ from_station_name: chr [1:3818004] "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ to_station_id    : num [1:3818004] 84 624 644 176 35 49 49 142 148 141 ...
##  $ to_station_name  : chr [1:3818004] "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ usertype         : chr [1:3818004] "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
##  $ gender           : chr [1:3818004] "Male" "Female" "Female" "Male" ...
##  $ birthyear        : num [1:3818004] 1989 1990 1994 1993 1994 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   trip_id = col_double(),
##   ..   start_time = col_datetime(format = ""),
##   ..   end_time = col_datetime(format = ""),
##   ..   bikeid = col_double(),
##   ..   tripduration = col_number(),
##   ..   from_station_id = col_double(),
##   ..   from_station_name = col_character(),
##   ..   to_station_id = col_double(),
##   ..   to_station_name = col_character(),
##   ..   usertype = col_character(),
##   ..   gender = col_character(),
##   ..   birthyear = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Process

Key tasks

  1. Check the data for errors.
  2. Choose your tools.
  3. Transform the data so you can work with it effectively.
  4. Document the cleaning process.
Combine_data_01<-Combine_data %>% 
  select(trip_id,bikeid,tripduration,usertype,gender,birthyear)
colnames(Combine_data_01)
## [1] "trip_id"      "bikeid"       "tripduration" "usertype"     "gender"      
## [6] "birthyear"
Combine_data_02<-Combine_data_01 %>% 
  mutate(current_year= 2019)
glimpse(Combine_data_02)
## Rows: 3,818,004
## Columns: 7
## $ trip_id      <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 2174244…
## $ bikeid       <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205, 3939…
## $ tripduration <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, 886, …
## $ usertype     <chr> "Subscriber", "Subscriber", "Subscriber", "Subscriber", "…
## $ gender       <chr> "Male", "Female", "Female", "Male", "Male", "Female", "Ma…
## $ birthyear    <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995, 199…
## $ current_year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 201…
sum(duplicated(Combine_data_02))
## [1] 0
sum(is.na(Combine_data_02))
## [1] 1097957
Combine_data_02_Na<-drop_na(Combine_data_02)
sum(is.na(Combine_data_02_Na))
## [1] 0
Combine_data_02_Na_F<- Combine_data_02_Na %>% 
  mutate(year_of_birth=current_year - birthyear )

glimpse(Combine_data_02_Na_F)
## Rows: 3,258,796
## Columns: 8
## $ trip_id       <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 217424…
## $ bikeid        <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205, 393…
## $ tripduration  <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, 886,…
## $ usertype      <chr> "Subscriber", "Subscriber", "Subscriber", "Subscriber", …
## $ gender        <chr> "Male", "Female", "Female", "Male", "Male", "Female", "M…
## $ birthyear     <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995, 19…
## $ current_year  <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
## $ year_of_birth <dbl> 30, 29, 25, 26, 25, 36, 35, 29, 24, 23, 25, 25, 33, 29, …

Arrange

Key tasks

  1. Aggregate your data so it’s useful and accessible.
  2. Organize and format your data.
  3. Perform calculations.
  4. Identify trends and relationships.
Combine_data_02_Na_F %>% 
  distinct(year_of_birth) %>% 
  arrange(year_of_birth) 
## # A tibble: 89 × 1
##    year_of_birth
##            <dbl>
##  1             5
##  2            16
##  3            17
##  4            18
##  5            19
##  6            20
##  7            21
##  8            22
##  9            23
## 10            24
## # ℹ 79 more rows
Combine_data_02_Na_F %>% 
 filter(year_of_birth == 5)
## # A tibble: 5 × 8
##    trip_id bikeid tripduration usertype   gender birthyear current_year
##      <dbl>  <dbl>        <dbl> <chr>      <chr>      <dbl>        <dbl>
## 1 22463474   6225         7209 Subscriber Female      2014         2019
## 2 22483110   6391         4515 Subscriber Female      2014         2019
## 3 22634065   2076         8469 Subscriber Female      2014         2019
## 4 22670749   2076       175251 Subscriber Female      2014         2019
## 5 22895143   2334      2479420 Subscriber Female      2014         2019
## # ℹ 1 more variable: year_of_birth <dbl>
Combine_data_02_Na_F_1<- Combine_data_02_Na_F %>% 
  filter(year_of_birth >= 16)


Combine_data_02_Na_F_1 %>% 
  distinct(year_of_birth) %>% 
  arrange(year_of_birth)
## # A tibble: 88 × 1
##    year_of_birth
##            <dbl>
##  1            16
##  2            17
##  3            18
##  4            19
##  5            20
##  6            21
##  7            22
##  8            23
##  9            24
## 10            25
## # ℹ 78 more rows
Combine_data_02_Na_F_1 %>% 
  group_by(gender) %>% 
  summarise(min_trip= min(tripduration),max_trip = max(tripduration),avg_trip=mean(tripduration))
## # A tibble: 2 × 4
##   gender min_trip max_trip avg_trip
##   <chr>     <dbl>    <dbl>    <dbl>
## 1 Female       61  8203637    1301.
## 2 Male         61  9056633     987.
Combine_data_Final<-Combine_data_02_Na_F_1 %>% 
  mutate(age_cat = case_when(year_of_birth >= 16 & year_of_birth <= 30 ~ "16-30",
                             year_of_birth >= 31 & year_of_birth <= 50 ~  "31-50",
                             year_of_birth >= 51 & year_of_birth <= 70 ~  "51-70",
                             year_of_birth >= 71 & year_of_birth <= 90 ~   "71-90",
                             year_of_birth >= 91 & year_of_birth <=119 ~ "94-119"))

glimpse(Combine_data_Final)
## Rows: 3,258,791
## Columns: 9
## $ trip_id       <dbl> 21742443, 21742444, 21742445, 21742446, 21742447, 217424…
## $ bikeid        <dbl> 2167, 4386, 1524, 252, 1170, 2437, 2708, 2796, 6205, 393…
## $ tripduration  <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336, 886,…
## $ usertype      <chr> "Subscriber", "Subscriber", "Subscriber", "Subscriber", …
## $ gender        <chr> "Male", "Female", "Female", "Male", "Male", "Female", "M…
## $ birthyear     <dbl> 1989, 1990, 1994, 1993, 1994, 1983, 1984, 1990, 1995, 19…
## $ current_year  <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 20…
## $ year_of_birth <dbl> 30, 29, 25, 26, 25, 36, 35, 29, 24, 23, 25, 25, 33, 29, …
## $ age_cat       <chr> "16-30", "16-30", "16-30", "16-30", "16-30", "31-50", "3…
sum(is.na(Combine_data_Final))
## [1] 136
Combine_data_Final<-drop_na(Combine_data_Final)

Combine_data_Final %>% 
  group_by(age_cat) %>% 
  summarise(count= n())
## # A tibble: 5 × 2
##   age_cat   count
##   <chr>     <int>
## 1 16-30   1463681
## 2 31-50   1414519
## 3 51-70    372814
## 4 71-90      6789
## 5 94-119      852
view(Combine_data_Final)

Share

Key task

  1. Determine the best way to share your findings.
  2. Create effective data visualizations.
  3. Present your findings.
  4. Ensure your work is accessible.

I will visualize the data according to the gender type and age category.

  1. Gender Vise
Total_Gender_Per<-Combine_data_Final %>% 
  group_by(gender) %>%
  summarise(count=n()) %>% 
  mutate(Percent=paste0(round(count/sum(count)*100,2),"%"))


ggplot(Total_Gender_Per,aes(x=gender,y=count,fill =count))+
  geom_col()+theme_minimal()+
  geom_text(aes(label=Percent),vjust=-0.4)+
  labs(title = "Total Count of Men & Woman")

trip_percent<- Combine_data_Final %>% 
  group_by(gender) %>% 
  summarise(Count=sum(tripduration)) %>% 
  mutate(percent= paste0(round(Count/sum(Count)*100,2),"%"))

ggplot(trip_percent,aes(x=gender,y=Count,fill=Count))+
  geom_col()+theme_minimal()+
  geom_text(aes(label=percent),vjust=-0.4)+
  labs(title = "Trip Duration According to Male And Female")

  1. Age category
Total_count_Age<-Combine_data_Final %>% 
  group_by(age_cat) %>% 
  summarise(count=n()) %>% 
  mutate(Percent=paste0(round(count/sum(count)*100,2),"%"))

ggplot(Total_count_Age,aes(x=age_cat,y=count,fill=count))+
  geom_col()+theme_minimal()+
  geom_text(aes(label=Percent), vjust=-0.4)+
  labs(title = "Total Count According to the Different Age Group")

Trip_duration_age<- Combine_data_Final %>% 
  group_by(age_cat) %>% 
  summarise(Count=sum(tripduration)) %>% 
  mutate(Percent=paste0(round(Count/sum(Count)*100,2),"%"))


ggplot(Trip_duration_age,aes(x=age_cat,y=Count,fill=Count))+
  geom_col()+theme_minimal()+
  geom_text(aes(label=Percent),vjust=- 0.4)+
  labs(title = "Total Trip Duration According to Male and Female")

Findings

1.Gender vise :- Male used most of the bikes in comparison to the female. 73.67 % male used the bikes followed by the 26.33 % female. In case of trip duration, Male has high trip duration (67.99%) in comparison to female (32.01%).

2.Age category :- As per the data, people of 16 to 50 age group has the highest usage of bikes with 88.33% and trip duration of 16 to 50 age grop has highest.

Conclusion

To conclude, Male has highest trip duration and count in comparison with female and 16 to 50 age group people are leading in both count and trip duration.

Act

Recommendations

  1. We should target people who is come under the age category of 30 to 51 and run the different online campaign . we should also use survey if possible to know what they want so that we can convert casual member to annual members.

  2. we should also more focused on the female side also because we can increase the count and convert them into annual membership.we should also run the online survey to know the preference.we can also include the digital marketing and promotion.