Case Study

How does bike-share navigate speedy success?

Business Problem:

How casual riders and annual members use Cyclistic bikes differently

Background

This case study is part of the Google Data Analytics Certificate: Course 8 Capstone. This case study will illuminate the data analytics skills that I have learned throughout this certification.

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

About the data

The data used for this project is obtained from divyy data The divyy data contains quarter wise data sets of bike sharing data from 2013-2022. For this case study, we will work with 2019 and 2020 data.

Stucture of the Case Study

There are 2 parts in this case study. The first part will deal with data from the first quarter of 2019. We will mainly focus on the gender and age variables here because those variables were later dropped the succeeding data sets.

The second part will focus on the last 3 quarters of 2019 and the first quarter of 2020. The larger data will give us more observations to work with and hence produce more concrete results and we can check if there is seasonality in the data and monthly variations.

Methodology

R was used for the analysis of this case study. The other options were Mysql, Excel and Tableau. I chose R because it can deal with all the aspects of this case study. It has strong.

  • Data cleaning capabilities
  • Visualization prowess
  • Analytics
  • Publishing

It is an all inclusive tool whereas with other tools mentioned, I would have had to use them in combination.

The Business problem

The business problem here is to identify how the customers who are casual users of the bike sharing service differ from the members who hold annual subscription. By studying how they differ, we can put forward reccomendations on how we can convert customers into members.

Part 1.

Previewing the data

Data summary
Name trips
Number of rows 365069
Number of columns 12
_______________________
Column type frequency:
character 4
numeric 6
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
from_station_name 0 1.00 10 43 0 594 0
to_station_name 0 1.00 10 43 0 600 0
usertype 0 1.00 8 10 0 2 0
gender 19711 0.95 4 6 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
trip_id 0 1.00 21960871.66 127175.00 21742443 21848765 21961829 22071823 22178528
bikeid 0 1.00 3429.48 1923.32 1 1777 3489 5157 6471
tripduration 0 1.00 1016.34 27913.51 61 326 524 866 10628400
from_station_id 0 1.00 198.09 153.49 2 76 170 287 665
to_station_id 0 1.00 198.58 154.47 2 76 168 287 665
birthyear 18023 0.95 1981.67 11.25 1900 1975 1985 1990 2003

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
start_time 0 1 2019-01-01 00:04:37 2019-03-31 23:53:48 2019-02-25 07:52:56 343022
end_time 0 1 2019-01-01 00:11:07 2019-06-17 16:04:35 2019-02-25 08:03:50 338367
Usertype Count
Customer 5934
Subscriber 339423

We can see that the there are a lot more Subscribers using the bikes as compared to Customers

Finding the mean duration for customers and subscribers
Usertype Mean Duration
Customer 37.0 mins
Subscriber 13.9 mins

We see that although there are a lot more Subscribers than Customers, the Customers spend much more time on a bike than the Subscribers

check how many riders are male and how many are female
gender Count
Female 66918
Male 278439

A disproportionate amount of our bike users are male. This indicates that we need to focus more on marketing the bikes to females.

Combining the gender and usertype data into the same graph

Finding unique bikes, start_stations and end stations
category count
unique bikes 4755
unique start stations 587
unique end stations 592

How much time each gender spends on a bike

gender mean_duration
Female 15
Male 14

This figure shows that although there are more male bikers than female, the female bikers spend more time on the bike than men do hence reinforcing the fact that the female riders could bring in more revenue.

This brings us to our first reccomendation of this case study.

Reccomendation 1:

Women riders spend more time on the bike even though they have a lot less rides than men. Cyclistic should focus on bringing more women to join our service.

Time on a bike by age

The general trend of age vs duration is that as age increases, the amount of time on bike decreases. Therefore we should focus more on young people to add to our subscribers.

However we see a significant increase in bike usage for people that are 75 years or older. This could be because of the health benefits or riding a bike.

Checking the average duration on a bike by Age and User type

Running the code again without the outliers,

From this graph, we can see that the general trend of age vs duration is slightly decreasing for both customers and subscribers.

The mean duration of Subscribers is throughout less than that of customers. There are some interesting points to mention.

  • The mean duration of riders aged 16-20 have an unusually high durations on a bike as shaded in the graph.

  • The mean duration of riders aged 55-70 also have an unusually high duration on a bike as shaded in the graph.

This shows that very young people and very old people are alot more active on the bike and hence our business strategy should focus on increasing Subscribers in these categories.

This brings us to our 2nd reccomendation of this case study.

Reccomentation 2:

People aged 16-20 and 55-70 spend more time than other age groups. Marketing strategies should be devised to attract customers of these age groups.

#now lets check the duration by usertype

The overall duration shows that customers on average spend more time on the bike than subscribers. This means that subscribers mainly use their bikes for short routes while customers use the bikes for longer routes. This makes sense. Assuming that for a customer, a ticket costs the same no matter the amount of distance traveled from station to station. While the subscriber can be charged less for shorter routes and more for longer routes.

It could also be the case that the people who use longer routes are tourists and because the bike service is not available in their area, they don’t see a value in buying the subscription.

Reccomendation 3:

Make an incentive program in the subscription to attract people who use the bikes for longer routes.

Part 2

Importing data from the last 3 quarters of 2019 and first quarter of 2020

Previewing the data

Data summary
Name all_trips
Number of rows 3879822
Number of columns 10
_______________________
Column type frequency:
character 5
numeric 3
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
trip_id 0 1 8 16 0 3879822 0
bikeid 0 1 1 11 0 6004 0
from_station_name 0 1 5 43 0 643 0
to_station_name 1 1 5 43 0 644 0
usertype 0 1 8 10 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
tripduration 0 1 1333.90 28354.71 -9.2 328 645 1225 9056633
from_station_id 0 1 202.90 157.14 1.0 77 174 291 675
to_station_id 1 1 203.76 157.19 1.0 77 174 291 675

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
start_time 0 1 2019-04-01 00:02:22 2020-03-31 23:51:34 2019-08-14 17:43:38 3362333
end_time 0 1 2019-04-01 00:09:48 2020-05-19 20:10:34 2019-08-14 18:02:04 3299507
##  [1] "trip_id"           "start_time"        "end_time"         
##  [4] "bikeid"            "tripduration"      "from_station_id"  
##  [7] "from_station_name" "to_station_id"     "to_station_name"  
## [10] "usertype"
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     -0.20      5.50     10.80     22.23     20.40 150943.90
## # A tibble: 1 × 1
##   `min(tripduration)`
##                 <dbl>
## 1                -9.2
min(tripduration)
3.016667
Summary statistics for duration
statistic Value
Mean 22.40365
Median 10.80000
Standard Deviation 474.39737

Compared to part 1, this data has a significanly more bike trips.

We see that the user type and duration data is in line with the data presented in part 1. Customers use the bikes for longer routes while subscribers use the bikes for shorter routes.

Hourly analysis

The hourly analysis shows that bike usage is at peak from 7:00 am to 9:00 am and from 4:00 pm to 6:00 pm. This further reinforces the idea that people use the bikes to commute to work.

Reccomendation 4:

Make packages for people who commute to work

Seasonality

Summer is the most popular season when it comes to riding bikes

Combining Seasons and month

Reccomendations

Based on the analysis of this case study, the following reccomendations are made.

  1. Women riders spend more time on the bike even though they have a lot less rides than men. Cyclistic should focus on bringing more women to join our service.

  2. People aged 16-20 and 55-70 spend more time than other age groups. Marketing strategies should be devised to attract customers of these age groups.

  3. Make an incentive program in the subscription to attract people who use the bikes for longer routes.

  4. Make packages for people who commute to work