Introduction and Rationale of Project

This project is spearheaded by Lily Moreno, Director of Marketing, to understand the differences between the cycling uses of casual riders and annual members. By using data analysis tools and techniques, the team is going to discover patterns and behavior and other factors that will:

For the sake of reference, the following definition applies.


Business Task

The marketing analyst team has to find answers to the following questions:

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

I am tasked with using all data available to answer question 1.

My deliverables

I will be producing the following deliverables for my part in the project:

  • Executive summary of business task and analysis
  • Documentation of cleaning and manipulation of data
  • Supporting visualizations and key findings
  • Main analysis and top three recommendations
  • An appendix of all data sources used

Expectations of key stakeholders

After company meetings, the expectations of key stakeholders are as follows:

Company executives - Data needs to be fully analyzed and presented to convince company executives that a marketing strategy targeting casual riders would result in greater profits.

Director of marketing - Data and analysis supporting or questioning the market strategy of targeting casual riders needs to be presented.

Marketing analyst members - Data and analysis on how casual riders differ in bike usage from annual members needs to be presented to other members.


Data Analysis Process

Key question: How do annual members and casual riders use Cyclistic bikes differently

To answer the above question, data from 12 months of trip data for the year 2022 is extracted. For a review of the data used in this analysis, please click this link. Data is provided by Motivate International Inc under a license from divvybikes. As we do not have first hand access to data, we need to check for integrity, accuracy, consistency and completeness during the data cleaning process.

There is also a need to check on how members and casual riders are charged for their rides. (Is it based on time spent on the bike, distance traveled, type of bike used etc?) According to my own research, members and casual riders are charged based on the information from divvybikes.

Information about data used

The set of data from Motivational Inc. provides monthly information about usage of bikes with parameters:

  • ride_id
  • start and end time
  • start station and end station
  • start and end latitude and longtitude
  • member types (casual or member)

To get an idea on the difference in behavior, data will be analyzed in the following aspects among members and casual riders:

  1. How frequently each type of bike is used
    1. number of rides per type of bike grouped by member type
    2. how number of rides per type of bike vary by month and week of days grouped by member type
  2. Time spent on each type of bike
    1. total and average duration per type of bike grouped by member type
    2. how total and average duration per type of bike varies by month grouped by member type

Based on pricing information provided by divvybikes, the revenue by customer type are calculated as follows:

Members Pricing

Classic bike

  • $10 per month for 45 minutes/ride
  • An extra $0.16/min beyond 45 minutes

Electric bike

  • $0.16/min per ride

Scooters

  • $0.25/min per ride

Casual Riders Pricing

Classic bike single rides

  • $1 per unlock
  • $0.16/min per ride

Classic bike day rides

  • $15 per day for 3 hours/ride
  • $0.16/min per ride beyond 3 hours

Electric bike/scooters rides

  • $1 per unlock
  • $0.39/min per ride

Limitations of data

Limitations of data includes not being able to link trip behavior to actual users as they are personally identifiable information, not being able to determine if riders live in the Cyclistic service area (and therefore how that impacts rider behavior) and whether each user made multiple trips with different ride_id. This impacts calculating revenue for member and full day pass rides as each member and full day pass rider may be tied to multiple ride_ids.

Assumptions about data

Since we cannot know what kind of plan each rider subscribed to for each ride, we assume the following:

  1. For casual riders, any classic bike trip duration above 93 minutes each ride are subscribers to day rides as the day ride of $15 dollars per day are more economic. I assume casual riders will go for the most economic option.

  2. Since we cannot attribute multiple ride_id to one rider from the trip data, only the revenue from trip duration charged by minutes is calculated. I assume there are member and day ride subscription lists which will give a straightforward calculation of total monthly revenues from membership fees and day ride subscribers.


Data Cleaning Worklog

The original data are kept intact and stored in a different folder from the cleansed and processed data to allow other teams to work with the original data.

The directory for the original data is %user%/desktop/bike_trips_data_raw.

The directory for the cleansed and processed data is %user%/desktop/bike_trips_data_processed.

Data sorting and filtering in Excel

Step 1. Since the distance traveled and location of bike riders are not important for determining the profitability of rides, the following columns of data are removed from the tables:

  • start_station_name
  • start_station_id
  • end_station_name
  • end_station_id
  • start_lat
  • start_lng
  • end_lat
  • end_lng

Step 2. To better analyse the difference in riding habits between members and casual riders, the table for each month of data is separated into data for members and casual riders. Any blank rows are also deleted in Excel using Home -> Find & Select -> Go to Special, select “Blanks” and then “Delete Sheet Rows”.

Step 3. Ensure columns started_at and ended_at are formatted as d/m/yyyy h:mm. Ensure the other columns are formatted as text.

Step 4. Set up another column called duration and format it as h:mm. Then use ended at - started at to get duration elapsed. Filter duration for any abnormalities and correct data that sorts ended at as started as and vice versa.

Step 5. To prevent accidental deletion of data, the sheet is protected with Review -> Protect Sheet. The following items are checked:

  • Select locked cells
  • Select unlocked cells
  • Format cells
  • Format columns
  • Format rows
  • Sort
  • Use PivotTable and PivotChart

Repeat all the steps 1 to 5 above for all member and casual data.

Data Analysis in SQL

Step 1. Upload all the data into SQL.

Casual riders data

Step 2. To determine the day of week spent on each ride, I extracted the start and end dates of the rides. For the casual riders, I used the following SQL query on each table:

SELECT  

ride_id,
rideable_type AS bike_type,
EXTRACT(DATE FROM started_at) AS start_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM started_at)) AS start_day,
EXTRACT(DATE FROM ended_at) AS end_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM ended_at)) AS end_day,
type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_casual.2022-xx-biketripdata-c`

WHERE rideable_type = "classic_bike"

ORDER BY started_at

Then I choose “Save Results” another BigQuery table and name it “2022_xx_cclassicbikedata_dayofweek”. I repeat this step for electric_bike (2022_xx_celectricbikedata_dayofweek) and docked_bike (2022_xx_cdockedbikedata_dayofweek).

To determine number of classic bike rides from casual riders for every day of each month, I used the following SQL query on each table:

SELECT 

start_day,
COUNT(ride_id) AS number_of_rides,

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_cclassicbikedata_dayofweek` 

 GROUP BY start_day

Then I choose “Save Results” as local CSV file and name it “2022_xx_cclassicbikedata_bydays”.I repeat this step for electric_bike (2022_xx_celectricbikedata_bydays) and docked_bike (2022_xx_cdockedbikedata_bydays).

Step 3. To determine the time spent on each ride, columns are separated into date and time. For the casual riders, I used the following SQL query on each table:

SELECT  

  rideable_type AS bike_type,
  DATETIME_DIFF(ended_at, started_at, MINUTE) AS duration_of_trip_mins,
  type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_casual.2022-xx-biketripdata-c` 

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_biketripedatawithduration_c

Step 4. To determine the duration of time spent on classic bikes by casual riders, I run the following query:

SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type = "classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_cclassicbikedata_duration”.

Step 5. To determine the duration of time spent on electric bikes by casual riders, I run the following query:

SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type = "electric_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_celectricbikedata_duration”.

Step 6. To determine the duration of time spent on docked bikes by casual riders, I run the following query:

SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type = "docked_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_cdockedbikedata_duration”.

To determine minimum, maximum, average and total duration of classic bike rides from casual riders for each month, I used the following SQL query on each table:

SELECT  

bike_type,
MIN(duration_of_trip_mins) AS min_duration_of_trips_mins,
MAX(duration_of_trip_mins) AS max_duration_of_trips_mins,
AVG(duration_of_trip_mins) AS average_duration_of_trips_mins,
SUM(duration_of_trip_mins) AS total_duration_of_trips_mins

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_cclassicbikedata_duration` 

GROUP BY bike_type

Then I choose “Save Results” as local CSV file and name it “2022_xx_cclassicbikedata_bymins”.I repeat this step for electric_bike (2022_xx_celectricbikedata_bymins) and docked_bike (2022_xx_cdockedbikedata_bymins).

Step 7: Then based on our assumption of classic bike day rides, I created the following query to filter out a table for day rides on classic bikes based on duration of trips above 93 minutes:

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE duration_of_trip_mins > 93
and bike_type= "classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_biketripdatadayrides_c

Step 8. Next I create another table for classic bike single rides for all duration of trips equal or less than 93 minutes.

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE duration_of_trip_mins <= 93 
and bike_type="classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_biketripdatasinglerides_c

Step 9. Finally I create a third table for all ride durations for non classic bikes.

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type!="classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_biketripdata_nonclassicbikes_c

Step 10. Now to sort out a table of classic bike day rides who pay no more than $15, I used the following query:

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatadayrides_c` 

WHERE duration_of_trip_mins <= 180 

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_dayridesnotmorethan3hours_c”.

Step 11. To further organise the classic bike day rides, a table is created to filter out the results for day rides with trip duration above 180 minutes (3 hours) but not more than 1440 minutes (a day).

SELECT  

bike_type,
duration_of_trip_mins,
duration_of_trip_mins - 180 AS trip_duration_charged_by_min,
type_of_rider 

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatadayrides_c` 

WHERE duration_of_trip_mins > 180 and duration_of_trip_mins <= 1440

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_dayrideswithinoneday_c

Step 12. Finally, a table for classic bike day rides that exceed 1 day is created with the following query:

SELECT  

bike_type,
duration_of_trip_mins,
1440 - 180 AS day1_trip_of_duration_charged_by_min,
CASE
  WHEN duration_of_trip_mins > 1620 
  THEN "duration_of_trips_mins - 1620"
  ELSE NULL 
  END AS day2_trip_of_duration_charged_by_min,
type_of_rider,

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatadayrides_c` 

WHERE duration_of_trip_mins > 1440

ORDER BY duration_of_trip_mins

This creates two extra columns which calculates the trip duration charged by minutes for day 1 as well as day 2 if the trip duration for day 2 exceeds 3 hours. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_dayridesaboveoneday_c”.

The above steps 1 to 12 are repeated for all 12 months of casual riders data.

Member riders data

Step 2. To determine the day of week spent on each ride, I extracted the start and end dates of the rides. For the members, I used the following SQL query on each table:

SELECT  

ride_id,
rideable_type AS bike_type,
EXTRACT(DATE FROM started_at) AS start_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM started_at)) AS start_day,
EXTRACT(DATE FROM ended_at) AS end_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM ended_at)) AS end_day,
type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_member.2022-xx-biketripdata-m`

WHERE rideable_type = "classic_bike"

ORDER BY started_at

Then I choose “Save Results” another BigQuery table and name it “2022_xx_mclassicbikedata_dayofweek”. I repeat this step for rideable_type = “electric_bike” (2022_xx_melectricbikedata_dayofweek).

To determine number of classic bike rides from members for every day of each month, I used the following SQL query on each table:

SELECT 

start_day,
COUNT(ride_id) AS number_of_rides,

 FROM `case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_mclassicbikedata_dayofweek` 

 GROUP BY start_day

Then I choose “Save Results” as local CSV file and name it “2022_xx_mclassicbikedata_bydays”.I repeat this step for electric_bike (2022_xx_melectricbikedata_bydays).

Step 3. To determine the time spent on each ride, columns are separated into date and time. For the member riders, I used the following SQL query on each table:

SELECT  

  rideable_type AS bike_type,
  DATETIME_DIFF(ended_at, started_at, MINUTE) AS duration_of_trip_mins,
  type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_member.2022-xx-biketripdata-m` 

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_biketripedatawithduration_m”.

Step 4. To determine the duration of time spent on classic bikes by members, I run the following query:

SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type = "classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_mclassicbikedata_duration”.

Step 5. To determine the duration of time spent on electric bikes by members, I run the following query:

SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type = "electric_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_melectricbikedata_duration”.

To determine minimum, maximum, average and total duration of classic bike rides from members for each month, I used the following SQL query on each table:

SELECT  

bike_type,
MIN(duration_of_trip_mins) AS min_duration_of_trips_mins,
MAX(duration_of_trip_mins) AS max_duration_of_trips_mins,
AVG(duration_of_trip_mins) AS average_duration_of_trips_mins,
SUM(duration_of_trip_mins) AS total_duration_od_trips_mins

FROM `case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_mclassicbikedata_duration` 

GROUP BY bike_type

Then I choose “Save Results” as local CSV file and name it “2022_xx_mclassicbikedata_bymins”.I repeat this step for electric_bike (2022_xx_melectricbikedata_bymins).

Step 6. Next, I filter out members who travel for less than 45 minutes each ride using classic bikes.

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE duration_of_trip_mins <= 45
and bike_type= "classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikeridesnotabove45mins_m”.

Step 7. Next, I filter out members who travel for more than 45 minutes each ride using classic bikes. I also calculate a new column of trip duration charged by minutes.

SELECT  

bike_type,
duration_of_trip_mins,
duration_of_trip_mins - 45 AS trip_duration_charged_by_min,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE duration_of_trip_mins > 45
and bike_type= "classic_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikeridesabove45mins_m”.

Step 8. Next, I filter out members who travel using electric bikes.

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type = "electric_bike"

ORDER BY duration_of_trip_mins

Then I choose “Save Results” as another BigQuery table and name it “2022_xx_electricbikerides_m”.

Step 9. Finally as a safety check, I query for all results that are neither classic bike or electric bike.

SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type!="electric_bike" AND bike_type!="classic_bike"

ORDER BY duration_of_trip_mins

If there is a table for this query, I choose “Save Results” as another BigQuery table and name it “2022_xx_otherbikerides_m”.

The above steps 1 to 9 are repeated for all 12 months of member riders data.

Summarized Data For Revenue Calculation (SQL)

Casual riders data

Step 1. For ease of calculating revenues, I query the table about classic bike single rides for casual riders using the following SQL query:

SELECT 

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_duration_mins

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatasinglerides_c`

GROUP BY bike_type

This counts the total number of rides on classic bikes for single ride subscribers and also the total duration of all single rides on classic rides. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikesinglerides_c”.

Step 2. Then I create a table that summarises number of one day rides not above 3 hours.

SELECT  

bike_type,
COUNT(bike_type) AS number_of_rides

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_01_dayridesnotmorethan3hours_c` 

GROUP BY bike_type

This counts the total number of rides on classic bikes for day ride subscribers with rides not exceeding 3 hours. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikesonedayrides_notabove3hours_c”.

Step 3. Next , I create a table that summarises number of one day rides and total duration charged by minutes.

SELECT  

bike_type,
COUNT(bike_type) AS no_of_oneday_rides,
SUM(trip_duration_charged_by_min) AS total_duration_charged_by_min

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_01_dayrideswithinoneday_c`

GROUP BY bike_type

This counts the total number of rides on classic bikes for one day ride subscribers and total duration charged by minutes. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikesonedayrides_notabove1day_c”.

Step 4. Next, I create a table that summarises number of two day rides and total duration charged by minutes for first day.

SELECT  

bike_type,
COUNT(bike_type) AS no_of_twoday_rides,
SUM(day1_trip_of_duration_charged_by_min) AS total_day1_trip_duration_charged_by_min

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_01_dayridesaboveoneday_c`

GROUP BY bike_type

This counts the total number of rides on classic bikes for two day ride subscribers and total duration charged by minutes for first day. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikestwodayrides_c”.

Step 5. Finally, I create a table that summarizes number of rides and total duration of rides charged by other bike types.

SELECT  

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_duration_charged

FROM `case-study-divvybikes.2022_nonclassicbiketripdata_casual.2022_xx_biketripdata_nonclassicbikes_c`

GROUP BY bike_type

This counts the total number of rides on other bikes for casual riders and total duration charged by minutes. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_otherbikerides_c”.

The above steps 1 to 5 are repeated for all 12 months of casual riders data.

Member riders data

Step 1. For ease of calculating revenues, I query the table about classic bike rides not above 45 minutes for members using the following SQL query:

SELECT 

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_trip_duration

`case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_classicbikeridesnotabove45mins_m`

GROUP BY bike_type

This counts the total number of rides on classic bikes for members whose rides do not exceed 45 minutes. In other words, each ride is only charged monthly. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikerideschargedmonthly_m”.

Step 2. Then I create a table that summarises number of classic bike rides above 45 minutes for members.

SELECT  

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(trip_duration_charged_by_min) AS total_trip_duration_charged

FROM `case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_classicbikeridesabove45mins_m` 

GROUP BY bike_type

This counts the total number of classic bike rides for members and total duration charged by minutes. In other words, each ride has a monthly charge as well as a charge per minute exceeding 45 minutes. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_classicbikerideswithminutecharges_m”.

Step 3.

SELECT 

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_duration_charged

FROM `case-study-divvybikes.2022_nonclassicbiketripdata_member.2022_xx_electricbikerides_m`

GROUP BY bike_type

This counts the total number of classic bike rides for members and total duration charged by minutes. In other words, each ride has a monthly charge as well as a charge per minute exceeding 45 minutes. Then I choose “Save Results” as another BigQuery table and name it “2022_xx_electricbikerideschargedbymin_m”.

The above steps 1 to 3 are repeated for all 12 months of member riders data.


Data Visualization in Excel, Tableau and R

Excel

After creating all those monthly tables in SQL BigQuery, I downloaded them into Excel and put them together into annual tables of data to compare trends across months. The following Excel tables are assembled:

  • 2022_annual_classicbikedata_bydays_casual
  • 2022_annual_classicbikedata_bydays_member
  • 2022_annual_classicbikedata_bymins_casual
  • 2022_annual_classicbikedata_bymins_member
  • 2022_annual_dockedbikedata_bydays_casual
  • 2022_annual_dockedbikedata_bymins_casual
  • 2022_annual_electricbikedata_bydays_casual
  • 2022_annual_electricbikedata_bydays_member
  • 2022_annual_electricbikedata_bymins_casual
  • 2022_annual_electricbikedata_bymins_member
  • 2022rev_classicbikeonedayrides_notabove1day_c
  • 2022rev_classicbikeonedayrides_notabove3hoursperride_c
  • 2022rev_classicbikerideswithmonthly&mincharges_m
  • 2022rev_classicbikerideswithonlymonthlycharge_m
  • 2022rev_classicbikesinglerides_c
  • 2022rev_classicbiketwodayrides_c
  • 2022rev_dockedbikerides_c
  • 2022rev_electricbikerides_c
  • 2022rev_electricbikechargedbymin_m

Based on these tables, I used Excel to draw up charts to visualize bike usage and revenue patterns throughout the year.

R

I also attempted to draw line plot on one data table of the classic bike riding data for casual users using the following code in R.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(data.table)
library(readr)
X2022_annual_classicbikedata_bymonth_casual <- read_csv("bike_trips_data_processed/2022_annual_classicbikedata_bymonth_casual.csv")

## Setting a data frame as data table
setDT(X2022_annual_classicbikedata_bymonth_casual) 

## Sorts table into long form 
df1 <- melt(X2022_annual_classicbikedata_bymonth_casual,"Months")
print(df1) 

## Reorders Months from Jan to Dec
df1$Months = factor(df1$Months, levels = month.abb) 

## Actual plot
ggplot(df1, aes(x=Months, y=value, group=variable, colour=variable)) +
  geom_line()+
  geom_point()+
  ylab("Number of Rides")+
  labs(colour="Days of Week")+
  labs(title="Classic Bike Rides - Casual")

I drew a bar chart on another data table of the electric bike riding data for members using the following code in R.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(data.table)
library(readr)
X2022_annual_electricbikedata_byduration_member <- read_csv("bike_trips_data_processed/2022_annual_electricbikedata_byduration_member.csv")
X2022_annual_electricbikedata_byavgduration_member <- read_csv("bike_trips_data_processed/2022_annual_electricbikedata_byavgduration_member.csv")

## Setting data frames as data tables
setDT(X2022_annual_electricbikedata_byduration_member)
setDT(X2022_annual_electricbikedata_byavgduration_member)

## Sorts table into long form 
df2 <- melt(X2022_annual_electricbikedata_byduration_member,"month")
print(df2) 
df3 <- melt(X2022_annual_electricbikedata_byavgduration_member,"month")
print(df3)

## Reorders Months from Jan to Dec
df2$month = factor(df2$month, levels = month.abb)
df3$month = factor(df3$month, levels = month.abb) 

## Actual plots
ggplot(df2, aes(month, value, fill=variable, label=value)) +
  geom_bar(stat="identity", position = "dodge")+
  geom_text(vjust=-0.5, size=1.5)+
  ylab("Duration (Days)")+
  labs(fill="")+
  labs(title="Electric Bike Rides - Member")

ggplot(df3, aes(month, value, fill=variable, label=value)) +
  geom_bar(stat="identity", position = "dodge")+
  geom_text(vjust=-0.5, size=1.5)+
  ylab("Duration (Mins)")+
  labs(fill="")+
  labs(title="Electric Bike Rides - Member")

Tableau

I used Tableau to give better visualisations on the starting location of bike usage by casual riders as well as trip revenue from classic bike for members in 2022.

Viz 1: Trip Revenue of Classic Bike Rides for Members

Viz 2: Starting Location for Casual Riders


Analysis & Recommendations

Riding patterns of casual riders and members

The following comparisons were made between casual riders and members:

  1. Riding patterns every day of the week by each month of 2022
  2. Riding patterns every month of 2022 by each day of week
  3. Revenue patterns for 2022 for classic and electric bikes

Since we cannot trace each ride id to each rider, it is impossible to determine the true revenue from casual riders and members unless we are given more information about membership subscription revenue and day pass revenues. Alternatively, it is better to compare trip revenue charged by minute for each type of bike.

Classic bikes

For both casual riders and members, number of rides for classic bikes seem to peak around the middle of the year and decline by the end of the year. For casual riders, the peak is in July on Saturdays, reaching 40k rides. For members, the peak is in Jun on Thursdays, reaching 42k rides. For both casual riders and members, number of rides for classic bikes are lowest in January and February. For casual riders, number of rides are generally higher in June and July while that for members is in June. In general, members have higher number of rides for classic bikes than casual riders.

Number of classic bike rides of causal riders by days

Number of classic bike rides of members by days

Number of classic bike rides of casual riders by month

Number of classic bike rides of members by month

However, if we look at duration of rides, casual riders have higher total duration of rides at its peak compared to members. This could be due to the greater number of casual riders than members of classic bikes or due to the longer time casual riders spend on classic bike rides during the peak month (June). This is answered partly by looking at the average duration of each ride on classic bikes by casual riders and members. In general, casual riders spend a longer average time on classic bikes (over 20 minutes per ride every month) with the highest average duration being 32 minutes in March. Members, on the other hand, spend less than 20 minutes per ride every month with highest average duration being 15 minutes in July.

Total duration on classic bikes by casual riders

Total duration on classic bikes by members

Average duration each ride on classic bikes by casual riders

Average duration each ride on classic bikes by members

If we look at the total number of rides by each day of the week or by month, similar patterns occur. The total number of rides for members are generally higher than casual riders. The total number of rides for casual members tend to peak on Saturday whilst that of members tend to be on Tuesdays. This could be due to the fact that casual riders tend to use classic bikes for leisure while members tend to use classic bikes for work travel.

Total classic bike rides of casual riders by days

Total classic bike rides of members by days

Total classic bike rides of casual riders by months

Total classic bike rides of members by months

This all makes sense as casual riders would want to spend more time on each ride due to the unlock cost of 1 dollar per ride they incur for single rides. Thus they may choose to ride lesser number of times but extend the duration of each ride since the cost of $0.16 per minute is the same as members for classic bikes. On the other hand, the number of rides is irrelevant to members who do not have to pay unlock costs per ride but they may want to save money by limiting each ride duration to no more than 45 minutes as much as possible.

For classic bikes, trip revenues for both members and casual riders tend to peak in June. Trip revenue for members are only more profitable than casual riders who buy the full day pass for one day. Trip revenue is highest for casual riders who buy single rides.

Trip revenue of classic bike for members

Trip revenue of classic bike full day pass for casual riders

Trip revenue of classic bike single rides for casual riders

Trip revenue of classic bike two full day passes for casual riders

Electric bikes

For casual riders, number of rides for electric bikes seem to peak around July while for members, the peak happens in September and decline by the end of the year. For casual riders, the peak is on Saturdays, reaching 47k rides. For members, the peak is on Thursdays, reaching 39k rides. For both casual riders and members, number of rides for electric bikes are lowest in January and February. For casual riders, number of rides are generally higher in July while that for members is in August. In general, members have higher number of rides for electric bikes than casual riders.

Number of electric bike rides of causal riders by days

Number of electric bike rides of members by days

Number of electric bike rides of casual riders by month

Number of electric bike rides of members by month

However, if we look at duration of rides, casual riders have higher total duration of rides at its peak compared to members. This could be due to the greater number of casual riders than members of electric bikes or due to the longer time casual riders spend on electric bike rides during the peak month. This is answered partly by looking at the average duration of each ride on electric bikes by casual riders and members. In general, casual riders spend a longer average time on electric bikes with the highest average duration being 19 minutes per ride in May. Members, on the other hand, spend less than 15 minutes every month with highest average duration being 13 minutes per ride in June.

Total duration on electric bikes by casual riders

Total duration on electric bikes by members

Average duration each ride on electric bikes by casual riders

Average duration each ride on electric bikes by members

If we look at the total number of rides by each day of the week or by month, similar patterns occur. The total number of rides for members are generally higher than casual riders. The total number of rides for casual members tend to peak on Saturday whilst that of members tend to be on Thursdays. This could be due to the fact that casual riders tend to use electric bikes for leisure while members tend to use electric bikes for work travel.

Total electric bike rides of casual riders by days

Total electric bike rides of members by days

Total electric bike rides of casual riders by months

Total electric bike rides of members by months

This all makes sense as casual riders would want to spend more time on each ride due to the unlock cost of 1 dollar per ride they incur. Thus they may choose to ride lesser number of times but extend the duration of each ride. But since the charges on classic bikes are cheaper per minute, they would spend more time on average riding classic bikes than electric bikes. On the other hand, the number of rides is irrelevant to members who do not have to pay unlock costs per ride but they may want to save costs by spending more time on average on classic bikes since there are no charges for the first 45 minutes per ride.

For electric bikes, trip revenues for casual riders tend to peak in July whilst that for members tend to peak in August. Trip revenue for casual riders are always more profitable than members.

Trip revenue of electric bike for members

Trip revenue of classic bike for casual riders

Docked bikes

Since we do not have data for members who ride docked bikes, we look at riding patterns for casual riders instead. For casual riders, number of rides for docked bikes seem to peak at 8k rides around July on Saturdays and decline by the end of the year. Number of rides for docked bikes are lowest in December, January and February. Number of rides are generally higher in June. In general, casual riders have lowest number of rides for docked bikes.

Number of docked bike rides of causal riders by days

Number of classic bike rides of causal riders by days

Number of docked bike rides of causal riders by days

Number of docked bike rides of casual riders by month

Number of classic bike rides of casual riders by month

Number of electric bike rides of casual riders by month

However, if we look at duration of rides, docked bikes have second highest total duration of rides at its peak compared to other bike types. This could be due to the longer time casual riders spend on docked bike rides during the peak month since the number of rides are very low. This is answered by looking at the average duration of each ride on docked bikes by casual riders. In general, casual riders spend the longest average time on docked bikes compared to other bikes with the highest average duration being 223 minutes in Jan.

Total duration on docked bikes by casual riders

Total duration on classic bikes by casual riders

Total duration on electric bikes by casual riders

Average duration each ride on docked bikes by casual riders

Average duration each ride on classic bikes by casual riders

Average duration each ride on electric bikes by casual riders

If we look at the total number of rides by each day of the week or by month, similar patterns occur. The total number of rides for casual riders tend to peak on Saturday and in June and July. This could be due to the fact that casual riders tend to use docked bikes for leisure.

Total docked bike rides of casual riders by days

Total classic bike rides of casual riders by days

Total electric bike rides of casual riders by days

Total docked bike rides of casual riders by months

Total classic bike rides of casual riders by months

Total electric bike rides of casual riders by months

This all makes sense as casual riders would want to spend more time on each ride due to the unlock cost of 1 dollar per ride they incur. Thus they may choose to ride lesser number of times but extend the duration of each ride. But it is unclear why docked bikes have the highest average duration of rides compared to other bikes for casual riders.

For docked bikes, trip revenues for casual riders tend to peak in June. Trip revenue for casual riders are second most profitable compared to other bikes (after electric bikes) assuming a $0.39 charge per minute.

Trip revenue of electric bike for members

Solutions to business task

Q1. Who should we target for membership marketing?

Classic bikes

For frequent casual riders of classic bikes, we need to look at how much they will spend if they switch from casual riding plans to membership to determine whether membership subscription yields higher profits.

We assume that casual riders will always go for the cheapest options available to them without membership. Let us examine some scenarios.

For every member who rides no more than 45 minutes per ride, annual revenue from this member is only the annual membership fee of 120 dollars. But for casual riders who rides no more than 45 minutes per ride, annual revenue per rider is at least 123 dollars if the number of rides each year is above 14. The calculation of annual revenues are as follows:

  • Annual revenue (casual rider) = (45 (mins) x 0.16 (/min) + 1 (unlock cost)) x 15 (rides/year) = $123

  • Annual revenue (member) = 10 (/mth) x 12 (mths) = $120

For every member who rides for 3 hours per ride, annual revenue from each member is around 8,000 dollars since rides above 45 minutes are charged at $0.16 per minute. But for casual riders who rides no more than 3 hours per ride, annual revenue per rider is 5,475 dollars as they will choose the day plan of 15 dollars each day as the cheaper option. The calculation of annual revenues are as follows:

  • Annual revenue (casual rider) = 365 (days) x 15 (/day) = $5,475

  • Annual revenue (member) = ((180-45) (mins) x 0.16 (/min) x 365 (days) + 120 (membership fee) = $8,004

For every member and rider who rides more than 3 hours per ride, the cost per ride after 3 hours is the same, so which plan is more profitable depends on the revenue for the first 3 hours. Thus it is similar to the scenario above. The calculation of annual revenues are as follows:

  • Annual revenue (casual rider) = 365 (days) x 15 (/day) = $5,475

  • Annual revenue (member) = ((180-45) (mins) x 0.16 (/min) x 365 (days) + 120 (membership fee) = $8,004

Thus, I propose the following recommendations to maximize revenue:

  • For casual riders who ride no more than 45 minutes per ride, only promote membership to them if they ride for less than 15 rides each year on classic bikes.

  • For casual rider who ride more than 45 minutes per ride, always promote membership to them. Membership will always bring in more revenue regardless of the number of rides each year.

Electric bikes and docked bikes

For frequent casual riders of electric bikes, we need to look at how much they will spend if they switch from casual riding plans to membership to determine whether membership subscription yields higher profits. Let us examine some scenarios.

For every member who rides 1 minute per ride and only once per day, annual revenue from this member is 178 dollars given the $0.16 per min charge. But for casual riders who ride the same frequency, annual revenue per rider is at least 507 dollars. The calculation of annual revenues are as follows:

  • Annual revenue (casual rider) = (0.39 (/min) x 1 (min) + 1 (unlock cost)) x 365 (days) = $507

  • Annual revenue (member) = (0.16 (/min) x 1 (min) x 365 (days)) + 120 (membership fee) = $178

Membership only becomes more profitable under this revenue calculation:

  • Annual revenue (member) = (0.35 (/min) x 1 (min) x 365 (days)) + 380 (membership fee) = $508

Similarly for scooters, revenue from casual riders are higher.

The calculation of annual revenues are as follows:

  • Annual revenue (casual rider) = (0.39 (/min) x 1 (min) + 1 (unlock cost)) x 365 (days) = $507

  • Annual revenue (member) = (0.25 (/min) x 1 (min) x 365 (days)) + 120 (membership fee) = $211

Membership only becomes more profitable under this revenue calculation:

  • Annual revenue (member) = (0.35 (/min) x1 (min) x 365 (days)) + 380 (membership fee) = $508

Thus, I propose the following recommendations to maximise revenue:

  • Raise membership fees to minimum 380 dollars per year or 32 dollars per month and cost per minute to $0.35 for all new members and promote membership to casual riders who ride electric bikes and scooters only once per day and 1 min per ride.

Q2. How should we target casual riders based on their riding patterns?

Since casual riders prefer to use all types of bikes for leisure from the preliminary data we have, we can promote membership subscription by months ($32 per month) during peak periods to maximise revenue for electric and docked bikes.

For classic bikes, the original membership fee of 120 dollars per year can be promoted to casual riders who meet the criteria of riding above 45 minutes per ride.

Q3. What other data do we need to gather or collect?

To better access the profitability of our membership subscription, full day passes and better promote membership based on preferences of casual riders, we need to gather or collect the following information:

  1. Survey on the kind of bikes casual riders prefer to use and the purpose (for work or leisure)
  2. Current total revenue from membership subscriptions
  3. Total revenue from full days passes
  4. Number of casual riders who fit into the profile to promote membership to
  5. Information on why docked bikes have highest ride duration

Appendix

Sources of data

  1. Original trip data by divvybikes: https://divvy-tripdata.s3.amazonaws.com/index.html
  2. Pricing information by divvybikes: https://divvybikes.com/
---
title: "How Casual Riders Can Be Converted To Members"
author: "Yining Shi"
date: "7 Jan 2023"
output:
  html_notebook: default
  word_document: default
  pdf_document: default
  html_document:
    df_print: paged
---
____________________________________________________
# Introduction and Rationale of Project

This project is spearheaded by Lily Moreno, Director of Marketing, to understand the differences between the cycling uses of casual riders and annual members. By using data analysis tools and techniques, the team is going to discover patterns and behavior and other factors that will:

* *Motivate casual users to sign up as annual members*, and
* *How digital media can be used to target marketing towards casual users*

For the sake of reference, the following definition applies.

* **Casual riders**: Customers who purchase single or full day passes
* **Annual members**: Customers who purchase annual memberships

____________________________________________________
# Business Task

The marketing analyst team has to find answers to the following questions:

1. How do annual members and casual riders use Cyclistic bikes differently?
2. Why would casual riders buy Cyclistic annual memberships?
3. How can Cyclistic use digital media to influence casual riders to become members?

I am tasked with using all data available to answer question 1.

## My deliverables

I will be producing the following deliverables for my part in the project:

* Executive summary of business task and analysis
* Documentation of cleaning and manipulation of data
* Supporting visualizations and key findings
* Main analysis and top three recommendations
* An appendix of all data sources used

## Expectations of key stakeholders

After company meetings, the expectations of key stakeholders are as follows:

Company executives - Data needs to be fully analyzed and presented to convince company executives that a marketing strategy targeting casual riders would result in greater profits.

Director of marketing - Data and analysis supporting or questioning the market strategy of targeting casual riders needs to be presented.

Marketing analyst members - Data and analysis on how casual riders differ in bike usage from annual members needs to be presented to other members.

____________________________________________________
# Data Analysis Process

Key question: **How do annual members and casual riders use Cyclistic bikes differently**

To answer the above question, data from 12 months of trip data for the year 2022 is extracted. For a review of the data used in this analysis, please click [this link](https://divvy-tripdata.s3.amazonaws.com/index.html). Data is provided by **Motivate International Inc** under a [license](https://ride.divvybikes.com/data-license-agreement) from divvybikes. As we do not have first hand access to data, we need to check for *integrity, accuracy, consistency and completeness* during the data cleaning process.

There is also a need to check on how members and casual riders are charged for their rides. (Is it based on time spent on the bike, distance traveled, type of bike used etc?) According to my own research, members and casual riders are charged based on the information from [divvybikes](https://divvybikes.com/).

## Information about data used

The set of data from Motivational Inc. provides monthly information about usage of bikes with parameters:
    
* ride_id
* start and end time
* start station and end station
* start and end latitude and longtitude
* member types (*casual* or *member*)

To get an idea on the difference in behavior, data will be analyzed in the following aspects among members and casual riders:

1) **How frequently each type of bike is used**
    a) number of rides per type of bike grouped by member type
    b) how number of rides per type of bike vary by month and week of days grouped by member type
2) **Time spent on each type of bike**
    a) total and average duration per type of bike grouped by member type
    b) how total and average duration per type of bike varies by month grouped by member type

Based on pricing information provided by divvybikes, the revenue by customer type are calculated as follows:

#### Members Pricing

**Classic bike**

* $10 per month for 45 minutes/ride 
* An extra $0.16/min beyond 45 minutes

**Electric bike**

* $0.16/min per ride

**Scooters**

* $0.25/min per ride

#### Casual Riders Pricing

**Classic bike single rides**

* $1 per unlock
* $0.16/min per ride

**Classic bike day rides**


* $15 per day for 3 hours/ride
* $0.16/min per ride beyond 3 hours


**Electric bike/scooters rides**

* $1 per unlock
* $0.39/min per ride

## Limitations of data

Limitations of data includes not being able to link trip behavior to actual users as they are personally identifiable information, not being able to determine if riders live in the Cyclistic service area (and therefore how that impacts rider behavior) and whether each user made multiple trips with different ride_id. This impacts calculating revenue for member and full day pass rides as each member and full day pass rider may be tied to multiple ride_ids.

## Assumptions about data

Since we cannot know what kind of plan each rider subscribed to for each ride, we assume the following:

1. For casual riders, any classic bike trip duration above 93 minutes each ride are subscribers to day rides as the day ride of $15 dollars per day are more economic. I assume casual riders will go for the most economic option.

2. Since we cannot attribute multiple ride_id to one rider from the trip data, only the revenue from trip duration charged by minutes is calculated. I assume there are member and day ride subscription lists which will give a straightforward calculation of total monthly revenues from membership fees and day ride subscribers. 

____________________________________________________
# Data Cleaning Worklog

The original data are kept intact and stored in a different folder from the cleansed and processed data to allow other teams to work with the original data. 

The directory for the original data is *%user%/desktop/bike_trips_data_raw*.

The directory for the cleansed and processed data is *%user%/desktop/bike_trips_data_processed*.

## Data sorting and filtering in Excel

Step 1. Since the distance traveled and location of bike riders are not important for determining the profitability of rides, the following columns of data are removed from the tables:

* **start_station_name**
* **start_station_id**
* **end_station_name**
* **end_station_id**
* **start_lat**
* **start_lng**
* **end_lat**
* **end_lng**

Step 2. To better analyse the difference in riding habits between members and casual riders, the table for each month of data is separated into data for members and casual riders. Any blank rows are also deleted in Excel using Home -> Find & Select -> Go to Special, select "Blanks" and then "Delete Sheet Rows".

Step 3. Ensure columns **started_at** and **ended_at** are formatted as *d/m/yyyy h:mm*. Ensure the other columns are formatted as *text*.

Step 4. Set up another column called duration and format it as *h:mm*. Then use ended at - started at to get duration elapsed. Filter duration for any abnormalities and correct data that sorts ended at as started as and vice versa.

Step 5. To prevent accidental deletion of data, the sheet is protected with Review -> Protect Sheet. The following items are checked:

* Select locked cells
* Select unlocked cells
* Format cells
* Format columns
* Format rows
* Sort
* Use PivotTable and PivotChart

Repeat all the steps 1 to 5 above for all member and casual data.

## Data Analysis in SQL

Step 1. Upload all the data into SQL.

### Casual riders data

Step 2. To determine the day of week spent on each ride, I extracted the start and end dates of the rides. For the casual riders, I used the following SQL query on each table:

```{mysql eval=FALSE}

SELECT  

ride_id,
rideable_type AS bike_type,
EXTRACT(DATE FROM started_at) AS start_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM started_at)) AS start_day,
EXTRACT(DATE FROM ended_at) AS end_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM ended_at)) AS end_day,
type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_casual.2022-xx-biketripdata-c`

WHERE rideable_type = "classic_bike"

ORDER BY started_at
```

Then I choose "Save Results" another BigQuery table and name it "**2022_xx_cclassicbikedata_dayofweek**". I repeat this step for electric_bike (**2022_xx_celectricbikedata_dayofweek**) and docked_bike (**2022_xx_cdockedbikedata_dayofweek**).

To determine number of classic bike rides from casual riders for every day of each month, I used the following SQL query on each table:

```{mysql eval=FALSE}
SELECT 

start_day,
COUNT(ride_id) AS number_of_rides,

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_cclassicbikedata_dayofweek` 

 GROUP BY start_day
```

Then I choose "Save Results" as local CSV file and name it "**2022_xx_cclassicbikedata_bydays**".I repeat this step for electric_bike (**2022_xx_celectricbikedata_bydays**) and docked_bike (**2022_xx_cdockedbikedata_bydays**).

Step 3. To determine the time spent on each ride, columns are separated into date and time. For the casual riders, I used the following SQL query on each table:

```{mysql eval=FALSE}
SELECT  

  rideable_type AS bike_type,
  DATETIME_DIFF(ended_at, started_at, MINUTE) AS duration_of_trip_mins,
  type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_casual.2022-xx-biketripdata-c` 
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_biketripedatawithduration_c**"

Step 4. To determine the duration of time spent on classic bikes by casual riders, I run the following query:

```{mysql eval=FALSE}
SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type = "classic_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_cclassicbikedata_duration**".

Step 5. To determine the duration of time spent on electric bikes by casual riders, I run the following query:

```{mysql eval=FALSE}
SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type = "electric_bike"

ORDER BY duration_of_trip_mins

```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_celectricbikedata_duration**".

Step 6. To determine the duration of time spent on docked bikes by casual riders, I run the following query:

```{mysql eval=FALSE}
SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type = "docked_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_cdockedbikedata_duration**".

To determine minimum, maximum, average and total duration of classic bike rides from casual riders for each month, I used the following SQL query on each table:

```{mysql eval=FALSE}
SELECT  

bike_type,
MIN(duration_of_trip_mins) AS min_duration_of_trips_mins,
MAX(duration_of_trip_mins) AS max_duration_of_trips_mins,
AVG(duration_of_trip_mins) AS average_duration_of_trips_mins,
SUM(duration_of_trip_mins) AS total_duration_of_trips_mins

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_cclassicbikedata_duration` 

GROUP BY bike_type
```

Then I choose "Save Results" as local CSV file and name it "**2022_xx_cclassicbikedata_bymins**".I repeat this step for electric_bike (**2022_xx_celectricbikedata_bymins**) and docked_bike (**2022_xx_cdockedbikedata_bymins**).

Step 7: Then based on our assumption of classic bike day rides, I created the following query to filter out a table for day rides on classic bikes based on duration of trips above 93 minutes:

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE duration_of_trip_mins > 93
and bike_type= "classic_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_biketripdatadayrides_c**"

Step 8. Next I create another table for classic bike single rides for all duration of trips equal or less than 93 minutes.

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE duration_of_trip_mins <= 93 
and bike_type="classic_bike"

ORDER BY duration_of_trip_mins


```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_biketripdatasinglerides_c**"

Step 9. Finally I create a third table for all ride durations for non classic bikes.

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_casual.2022_xx_biketripedatawithduration_c` 

WHERE bike_type!="classic_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_biketripdata_nonclassicbikes_c**"

Step 10. Now to sort out a table of classic bike day rides who pay no more than $15, I used the following query:

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatadayrides_c` 

WHERE duration_of_trip_mins <= 180 

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_dayridesnotmorethan3hours_c**".

Step 11. To further organise the classic bike day rides, a table is created to filter out the results for day rides with trip duration above 180 minutes (3 hours) but not more than 1440 minutes (a day).

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
duration_of_trip_mins - 180 AS trip_duration_charged_by_min,
type_of_rider 

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatadayrides_c` 

WHERE duration_of_trip_mins > 180 and duration_of_trip_mins <= 1440

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_dayrideswithinoneday_c**"

Step 12. Finally, a table for classic bike day rides that exceed 1 day is created with the following query:

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
1440 - 180 AS day1_trip_of_duration_charged_by_min,
CASE
  WHEN duration_of_trip_mins > 1620 
  THEN "duration_of_trips_mins - 1620"
  ELSE NULL 
  END AS day2_trip_of_duration_charged_by_min,
type_of_rider,

 FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatadayrides_c` 

WHERE duration_of_trip_mins > 1440

ORDER BY duration_of_trip_mins
```

This creates two extra columns which calculates the trip duration charged by minutes for day 1 as well as day 2 if the trip duration for day 2 exceeds 3 hours. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_dayridesaboveoneday_c**".

The above steps 1 to 12 are repeated for all 12 months of casual riders data.

### Member riders data

Step 2. To determine the day of week spent on each ride, I extracted the start and end dates of the rides. For the members, I used the following SQL query on each table:

```{mysql eval=FALSE}
SELECT  

ride_id,
rideable_type AS bike_type,
EXTRACT(DATE FROM started_at) AS start_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM started_at)) AS start_day,
EXTRACT(DATE FROM ended_at) AS end_date,
FORMAT_DATE("%A", EXTRACT(DATE FROM ended_at)) AS end_day,
type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_member.2022-xx-biketripdata-m`

WHERE rideable_type = "classic_bike"

ORDER BY started_at
```

Then I choose "Save Results" another BigQuery table and name it "**2022_xx_mclassicbikedata_dayofweek**". I repeat this step for rideable_type = "electric_bike" (**2022_xx_melectricbikedata_dayofweek**).

To determine number of classic bike rides from members for every day of each month, I used the following SQL query on each table:
```{mysql eval=FALSE}
SELECT 

start_day,
COUNT(ride_id) AS number_of_rides,

 FROM `case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_mclassicbikedata_dayofweek` 

 GROUP BY start_day
```

Then I choose "Save Results" as local CSV file and name it "**2022_xx_mclassicbikedata_bydays**".I repeat this step for electric_bike (**2022_xx_melectricbikedata_bydays**).

Step 3. To determine the time spent on each ride, columns are separated into date and time. For the member riders, I used the following SQL query on each table:

```{mysql eval=FALSE}
SELECT  

  rideable_type AS bike_type,
  DATETIME_DIFF(ended_at, started_at, MINUTE) AS duration_of_trip_mins,
  type_of_rider

FROM `case-study-divvybikes.2022_biketripdata_member.2022-xx-biketripdata-m` 
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_biketripedatawithduration_m**".

Step 4. To determine the duration of time spent on classic bikes by members, I run the following query:

```{mysql eval=FALSE}
SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type = "classic_bike"

ORDER BY duration_of_trip_mins
```

Then I choose “Save Results” as another BigQuery table and name it “**2022_xx_mclassicbikedata_duration**”.

Step 5. To determine the duration of time spent on electric bikes by members, I run the following query:

```{mysql eval=FALSE}
SELECT  

*

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type = "electric_bike"

ORDER BY duration_of_trip_mins
```

Then I choose “Save Results” as another BigQuery table and name it “**2022_xx_melectricbikedata_duration**”.

To determine minimum, maximum, average and total duration of classic bike rides from members for each month, I used the following SQL query on each table:

```{mysql eval=FALSE}
SELECT  

bike_type,
MIN(duration_of_trip_mins) AS min_duration_of_trips_mins,
MAX(duration_of_trip_mins) AS max_duration_of_trips_mins,
AVG(duration_of_trip_mins) AS average_duration_of_trips_mins,
SUM(duration_of_trip_mins) AS total_duration_od_trips_mins

FROM `case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_mclassicbikedata_duration` 

GROUP BY bike_type
```

Then I choose "Save Results" as local CSV file and name it "**2022_xx_mclassicbikedata_bymins**".I repeat this step for electric_bike (**2022_xx_melectricbikedata_bymins**).

Step 6. Next, I filter out members who travel for less than 45 minutes each ride using classic bikes.

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE duration_of_trip_mins <= 45
and bike_type= "classic_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikeridesnotabove45mins_m**".

Step 7. Next, I filter out members who travel for more than 45 minutes each ride using classic bikes. I also calculate a new column of trip duration charged by minutes.

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
duration_of_trip_mins - 45 AS trip_duration_charged_by_min,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE duration_of_trip_mins > 45
and bike_type= "classic_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikeridesabove45mins_m**".

Step 8. Next, I filter out members who travel using electric bikes.

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type = "electric_bike"

ORDER BY duration_of_trip_mins
```

Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_electricbikerides_m**".

Step 9. Finally as a safety check, I query for all results that are neither classic bike or electric bike.

```{mysql eval=FALSE}
SELECT  

bike_type,
duration_of_trip_mins,
type_of_rider 

FROM `case-study-divvybikes.2022_biketripdata_member.2022_xx_biketripedatawithduration_m` 

WHERE bike_type!="electric_bike" AND bike_type!="classic_bike"

ORDER BY duration_of_trip_mins
```

If there is a table for this query, I choose "Save Results" as another BigQuery table and name it "**2022_xx_otherbikerides_m**".

The above steps 1 to 9 are repeated for all 12 months of member riders data.

## Summarized Data For Revenue Calculation (SQL)

### Casual riders data
Step 1. For ease of calculating revenues, I query the table about classic bike single rides for casual riders using the following SQL query:

```{mysql eval=FALSE}
SELECT 

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_duration_mins

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_xx_biketripdatasinglerides_c`

GROUP BY bike_type
```

This counts the total number of rides on classic bikes for single ride subscribers and also the total duration of all single rides on classic rides. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikesinglerides_c**".

Step 2. Then I create a table that summarises number of one day rides not above 3 hours.

```{mysql eval=FALSE}
SELECT  

bike_type,
COUNT(bike_type) AS number_of_rides

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_01_dayridesnotmorethan3hours_c` 

GROUP BY bike_type
```

This counts the total number of rides on classic bikes for day ride subscribers with rides not exceeding 3 hours. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikesonedayrides_notabove3hours_c**".

Step 3. Next , I create a table that summarises number of one day rides and total duration charged by minutes.

```{mysql eval=FALSE}
SELECT  

bike_type,
COUNT(bike_type) AS no_of_oneday_rides,
SUM(trip_duration_charged_by_min) AS total_duration_charged_by_min

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_01_dayrideswithinoneday_c`

GROUP BY bike_type
```

This counts the total number of rides on classic bikes for one day ride subscribers and total duration charged by minutes. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikesonedayrides_notabove1day_c**".

Step 4. Next, I create a table that summarises number of two day rides and total duration charged by minutes for first day.
```{mysql eval=FALSE}
SELECT  

bike_type,
COUNT(bike_type) AS no_of_twoday_rides,
SUM(day1_trip_of_duration_charged_by_min) AS total_day1_trip_duration_charged_by_min

FROM `case-study-divvybikes.2022_classicbiketripdata_casual.2022_01_dayridesaboveoneday_c`

GROUP BY bike_type
```

This counts the total number of rides on classic bikes for two day ride subscribers and total duration charged by minutes for first day. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikestwodayrides_c**".

Step 5. Finally, I create a table that summarizes number of rides and total duration of rides charged by other bike types.

```{mysql eval=FALSE}
SELECT  

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_duration_charged

FROM `case-study-divvybikes.2022_nonclassicbiketripdata_casual.2022_xx_biketripdata_nonclassicbikes_c`

GROUP BY bike_type
```

This counts the total number of rides on other bikes for casual riders and total duration charged by minutes. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_otherbikerides_c**".

The above steps 1 to 5 are repeated for all 12 months of casual riders data.

### Member riders data

Step 1. For ease of calculating revenues, I query the table about classic bike rides not above 45 minutes for members using the following SQL query:

```{mysql eval=FALSE}
SELECT 

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_trip_duration

`case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_classicbikeridesnotabove45mins_m`

GROUP BY bike_type
```

This counts the total number of rides on classic bikes for members whose rides do not exceed 45 minutes. In other words, each ride is only charged monthly. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikerideschargedmonthly_m**".

Step 2. Then I create a table that summarises number of classic bike rides above 45 minutes for members.

```{mysql eval=FALSE}

SELECT  

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(trip_duration_charged_by_min) AS total_trip_duration_charged

FROM `case-study-divvybikes.2022_classicbiketripdata_member.2022_xx_classicbikeridesabove45mins_m` 

GROUP BY bike_type
```

This counts the total number of classic bike rides for members and total duration charged by minutes. In other words, each ride has a monthly charge as well as a charge per minute exceeding 45 minutes. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_classicbikerideswithminutecharges_m**".

Step 3.

```{mysql eval=FALSE}

SELECT 

bike_type,
COUNT(bike_type) AS no_of_rides,
SUM(duration_of_trip_mins) AS total_duration_charged

FROM `case-study-divvybikes.2022_nonclassicbiketripdata_member.2022_xx_electricbikerides_m`

GROUP BY bike_type
```

This counts the total number of classic bike rides for members and total duration charged by minutes. In other words, each ride has a monthly charge as well as a charge per minute exceeding 45 minutes. Then I choose "Save Results" as another BigQuery table and name it "**2022_xx_electricbikerideschargedbymin_m**".

The above steps 1 to 3 are repeated for all 12 months of member riders data.

____________________________________________________
# Data Visualization in Excel, Tableau and R

## Excel

After creating all those monthly tables in SQL BigQuery, I downloaded them into Excel and put them together into annual tables of data to compare trends across months. The following Excel tables are assembled:

* 2022_annual_classicbikedata_bydays_casual
* 2022_annual_classicbikedata_bydays_member
* 2022_annual_classicbikedata_bymins_casual
* 2022_annual_classicbikedata_bymins_member
* 2022_annual_dockedbikedata_bydays_casual
* 2022_annual_dockedbikedata_bymins_casual
* 2022_annual_electricbikedata_bydays_casual
* 2022_annual_electricbikedata_bydays_member
* 2022_annual_electricbikedata_bymins_casual
* 2022_annual_electricbikedata_bymins_member
* 2022rev_classicbikeonedayrides_notabove1day_c
* 2022rev_classicbikeonedayrides_notabove3hoursperride_c
* 2022rev_classicbikerideswithmonthly&mincharges_m
* 2022rev_classicbikerideswithonlymonthlycharge_m
* 2022rev_classicbikesinglerides_c
* 2022rev_classicbiketwodayrides_c
* 2022rev_dockedbikerides_c
* 2022rev_electricbikerides_c
* 2022rev_electricbikechargedbymin_m

Based on these tables, I used Excel to draw up charts to visualize bike usage and revenue patterns throughout the year.

## R

I also attempted to draw line plot on one data table of the classic bike riding data for casual users using the following code in R.

```{r}
library(tidyverse)
library(dplyr)
library(ggplot2)
library(data.table)
library(readr)
X2022_annual_classicbikedata_bymonth_casual <- read_csv("bike_trips_data_processed/2022_annual_classicbikedata_bymonth_casual.csv")

## Setting a data frame as data table
setDT(X2022_annual_classicbikedata_bymonth_casual) 

## Sorts table into long form 
df1 <- melt(X2022_annual_classicbikedata_bymonth_casual,"Months")
print(df1) 

## Reorders Months from Jan to Dec
df1$Months = factor(df1$Months, levels = month.abb) 

## Actual plot
ggplot(df1, aes(x=Months, y=value, group=variable, colour=variable)) +
  geom_line()+
  geom_point()+
  ylab("Number of Rides")+
  labs(colour="Days of Week")+
  labs(title="Classic Bike Rides - Casual")
```

I drew a bar chart on another data table of the electric bike riding data for members using the following code in R.

```{r}
library(tidyverse)
library(dplyr)
library(ggplot2)
library(data.table)
library(readr)
X2022_annual_electricbikedata_byduration_member <- read_csv("bike_trips_data_processed/2022_annual_electricbikedata_byduration_member.csv")
X2022_annual_electricbikedata_byavgduration_member <- read_csv("bike_trips_data_processed/2022_annual_electricbikedata_byavgduration_member.csv")

## Setting data frames as data tables
setDT(X2022_annual_electricbikedata_byduration_member)
setDT(X2022_annual_electricbikedata_byavgduration_member)

## Sorts table into long form 
df2 <- melt(X2022_annual_electricbikedata_byduration_member,"month")
print(df2) 
df3 <- melt(X2022_annual_electricbikedata_byavgduration_member,"month")
print(df3)

## Reorders Months from Jan to Dec
df2$month = factor(df2$month, levels = month.abb)
df3$month = factor(df3$month, levels = month.abb) 

## Actual plots
ggplot(df2, aes(month, value, fill=variable, label=value)) +
  geom_bar(stat="identity", position = "dodge")+
  geom_text(vjust=-0.5, size=1.5)+
  ylab("Duration (Days)")+
  labs(fill="")+
  labs(title="Electric Bike Rides - Member")

ggplot(df3, aes(month, value, fill=variable, label=value)) +
  geom_bar(stat="identity", position = "dodge")+
  geom_text(vjust=-0.5, size=1.5)+
  ylab("Duration (Mins)")+
  labs(fill="")+
  labs(title="Electric Bike Rides - Member")
```


## Tableau

I used Tableau to give better visualisations on the starting location of bike usage by casual riders as well as trip revenue from classic bike for members in 2022.

[**Viz 1: Trip Revenue of Classic Bike Rides for Members**](https://public.tableau.com/views/BikeSharingCaseStudy_16734316053840/Sheet1?:language=en-US&:display_count=n&:origin=viz_share_link) 


[**Viz 2: Starting Location for Casual Riders**](https://public.tableau.com/views/BikeSharingCaseStudyViz2/Sheet2?:language=en-US&:display_count=n&:origin=viz_share_link)

____________________________________________________
# Analysis & Recommendations

## Riding patterns of casual riders and members

The following comparisons were made between casual riders and members:

1. Riding patterns every day of the week by each month of 2022
2. Riding patterns every month of 2022 by each day of week
3. Revenue patterns for 2022 for classic and electric bikes

Since we cannot trace each ride id to each rider, it is impossible to determine the true revenue from casual riders and members unless we are given more information about membership subscription revenue and day pass revenues. Alternatively, it is better to compare trip revenue charged by minute for each type of bike. 

### Classic bikes

For both casual riders and members, number of rides for classic bikes seem to peak around the middle of the year and decline by the end of the year. For casual riders, the peak is in July on Saturdays, reaching 40k rides. For members, the peak is in Jun on Thursdays, reaching 42k rides. For both casual riders and members, number of rides for classic bikes are lowest in January and February. For casual riders, number of rides are generally higher in June and July while that for members is in June. **In general, members have higher number of rides for classic bikes than casual riders.**

![**Number of classic bike rides of causal riders by days**](casual_graphics/classicbikerides_casual_bydaysofweek.jpg)

![**Number of classic bike rides of members by days**](member_graphics/classicbikerides_member_bydaysofweek.jpg)

![**Number of classic bike rides of casual riders by month**](casual_graphics/classicbikerides_casual_bymonth.jpg)

![**Number of classic bike rides of members by month**](member_graphics/classicbikerides_member_bymonth.jpg)


However, if we look at duration of rides, **casual riders have higher total duration of rides at its peak compared to members.** This could be due to the greater number of casual riders than members of classic bikes or due to the longer time casual riders spend on classic bike rides during the peak month (June). This is answered partly by looking at the average duration of each ride on classic bikes by casual riders and members. **In general, casual riders spend a longer average time on classic bikes** (over 20 minutes per ride every month) with the highest average duration being 32 minutes in March. Members, on the other hand, spend less than 20 minutes per ride every month with highest average duration being 15 minutes in July.

![**Total duration on classic bikes by casual riders**](casual_graphics/classicbikerides_casual_bytotalduration.jpg)

![**Total duration on classic bikes by members**](member_graphics/classicbikerides_member_bytotalduration.jpg)

![**Average duration each ride on classic bikes by casual riders**](casual_graphics/classicbikerides_casual_byavgduration.jpg)

![**Average duration each ride on classic bikes by members**](member_graphics/classicbikerides_member_byavgduration.jpg)

If we look at the total number of rides by each day of the week or by month, similar patterns occur. The total number of rides for members are generally higher than casual riders. The total number of rides for casual members tend to peak on Saturday whilst that of members tend to be on Tuesdays. **This could be due to the fact that casual riders tend to use classic bikes for leisure while members tend to use classic bikes for work travel.**

![**Total classic bike rides of casual riders by days**](casual_graphics/classicbikerides_casual_totalsbydaysofweek.jpg)

![**Total classic bike rides of members by days**](member_graphics/classicbikerides_member_totalsbydaysofweek.jpg)

![**Total classic bike rides of casual riders by months**](casual_graphics/classicbikerides_casual_totalsbymonth.jpg)

![**Total classic bike rides of members by months**](member_graphics/classicbikerides_member_totalsbymonth.jpg)

This all makes sense as casual riders would want to spend more time on each ride due to the unlock cost of 1 dollar per ride they incur for single rides. Thus they may choose to ride lesser number of times but extend the duration of each ride since the cost of $0.16 per minute is the same as members for classic bikes. On the other hand, the number of rides is irrelevant to members who do not have to pay unlock costs per ride but they may want to save money by limiting each ride duration to no more than 45 minutes as much as possible. 

For classic bikes, trip revenues for both members and casual riders tend to peak in June. Trip revenue for members are only more profitable than casual riders who buy the full day pass for one day. Trip revenue is highest for casual riders who buy single rides.

![**Trip revenue of classic bike for members**](member_graphics/triprevenueclassicbike_member.jpg)

![**Trip revenue of classic bike full day pass for casual riders**](casual_graphics/triprevenueclassicbikefulldaypass_casual.jpg)

![**Trip revenue of classic bike single rides for casual riders**](casual_graphics/triprevenueclassicbikesingleridepass_casual.jpg)

![**Trip revenue of classic bike two full day passes for casual riders**](casual_graphics/triprevenueclassicbiketwofulldaypass_casual.jpg)

### Electric bikes

For casual riders, number of rides for electric bikes seem to peak around July while for members, the peak happens in September and decline by the end of the year. For casual riders, the peak is on Saturdays, reaching 47k rides. For members, the peak is on Thursdays, reaching 39k rides. For both casual riders and members, number of rides for electric bikes are lowest in January and February. For casual riders, number of rides are generally higher in July while that for members is in August. **In general, members have higher number of rides for electric bikes than casual riders.**

![**Number of electric bike rides of causal riders by days**](casual_graphics/electricbikerides_casual_bydaysofweek.jpg)

![**Number of electric bike rides of members by days**](member_graphics/electricbikerides_member_bydaysofweek.jpg)

![**Number of electric bike rides of casual riders by month**](casual_graphics/electricbikerides_casual_bymonth.jpg)

![**Number of electric bike rides of members by month**](member_graphics/electricbikerides_member_bymonth.jpg)

However, if we look at duration of rides, **casual riders have higher total duration of rides at its peak compared to members.** This could be due to the greater number of casual riders than members of electric bikes or due to the longer time casual riders spend on electric bike rides during the peak month. This is answered partly by looking at the average duration of each ride on electric bikes by casual riders and members. **In general, casual riders spend a longer average time on electric bikes** with the highest average duration being 19 minutes per ride in May. Members, on the other hand, spend less than 15 minutes every month with highest average duration being 13 minutes per ride in June.

![**Total duration on electric bikes by casual riders**](casual_graphics/electricbikerides_casual_bytotalduration.jpg)

![**Total duration on electric bikes by members**](member_graphics/electricbikerides_member_bytotalduration.jpg)

![**Average duration each ride on electric bikes by casual riders**](casual_graphics/electricbikerides_casual_byavgduration.jpg)

![**Average duration each ride on electric bikes by members**](member_graphics/electricbikerides_member_byavgduration.jpg)

If we look at the total number of rides by each day of the week or by month, similar patterns occur. The total number of rides for members are generally higher than casual riders. The total number of rides for casual members tend to peak on Saturday whilst that of members tend to be on Thursdays. **This could be due to the fact that casual riders tend to use electric bikes for leisure while members tend to use electric bikes for work travel.**

![**Total electric bike rides of casual riders by days**](casual_graphics/electricbikerides_casual_totalsbydaysofweek.jpg)

![**Total electric bike rides of members by days**](member_graphics/electricbikerides_member_totalsbydaysofweek.jpg)

![**Total electric bike rides of casual riders by months**](casual_graphics/electricbikerides_casual_totalsbymonth.jpg)

![**Total electric bike rides of members by months**](member_graphics/electricbikerides_member_totalsbymonth.jpg)

This all makes sense as casual riders would want to spend more time on each ride due to the unlock cost of 1 dollar per ride they incur. Thus they may choose to ride lesser number of times but extend the duration of each ride. But since the charges on classic bikes are cheaper per minute, they would spend more time on average riding classic bikes than electric bikes. On the other hand, the number of rides is irrelevant to members who do not have to pay unlock costs per ride but they may want to save costs by spending more time on average on classic bikes since there are no charges for the first 45 minutes per ride. 

For electric bikes, trip revenues for casual riders tend to peak in July whilst that for members tend to peak in August. Trip revenue for casual riders are always more profitable than members.

![**Trip revenue of electric bike for members**](member_graphics/triprevenueelectricbike_member.jpg)

![**Trip revenue of classic bike for casual riders**](casual_graphics/triprevenueelectricbikes_casual.jpg)

### Docked bikes

Since we do not have data for members who ride docked bikes, we look at riding patterns for casual riders instead. For casual riders, number of rides for docked bikes seem to peak at 8k rides around July on Saturdays and decline by the end of the year. Number of rides for docked bikes are lowest in December, January and February. Number of rides are generally higher in June. **In general, casual riders have lowest number of rides for docked bikes.**

![**Number of docked bike rides of causal riders by days**](casual_graphics/dockedbikerides_casual_bydaysofweek.jpg)

![**Number of classic bike rides of causal riders by days**](casual_graphics/classicbikerides_casual_bydaysofweek.jpg)

![**Number of docked bike rides of causal riders by days**](casual_graphics/electricbikerides_casual_bydaysofweek.jpg)

![**Number of docked bike rides of casual riders by month**](casual_graphics/dockedbikerides_casual_bymonth.jpg)

![**Number of classic bike rides of casual riders by month**](casual_graphics/classicbikerides_casual_bymonth.jpg)

![**Number of electric bike rides of casual riders by month**](casual_graphics/electricbikerides_casual_bymonth.jpg)

However, if we look at duration of rides, **docked bikes have second highest total duration of rides at its peak compared to other bike types.** This could be due to the longer time casual riders spend on docked bike rides during the peak month since the number of rides are very low. This is answered by looking at the average duration of each ride on docked bikes by casual riders. **In general, casual riders spend the longest average time on docked bikes compared to other bikes** with the highest average duration being 223 minutes in Jan.

![**Total duration on docked bikes by casual riders**](casual_graphics/dockedbikerides_casual_bytotalduration.jpg)

![**Total duration on classic bikes by casual riders**](casual_graphics/classicbikerides_casual_bytotalduration.jpg)

![**Total duration on electric bikes by casual riders**](casual_graphics/electricbikerides_casual_bytotalduration.jpg)

![**Average duration each ride on docked bikes by casual riders**](casual_graphics/dockedbikerides_casual_byavgduration.jpg)

![**Average duration each ride on classic bikes by casual riders**](casual_graphics/classicbikerides_casual_byavgduration.jpg)

![**Average duration each ride on electric bikes by casual riders**](casual_graphics/electricbikerides_casual_byavgduration.jpg)

If we look at the total number of rides by each day of the week or by month, similar patterns occur. The total number of rides for casual riders tend to peak on Saturday and in June and July. **This could be due to the fact that casual riders tend to use docked bikes for leisure.**

![**Total docked bike rides of casual riders by days**](casual_graphics/dockedbikerides_casual_totalsbydaysofweek.jpg)

![**Total classic bike rides of casual riders by days**](casual_graphics/classicbikerides_casual_totalsbydaysofweek.jpg)

![**Total electric bike rides of casual riders by days**](casual_graphics/electricbikerides_casual_totalsbydaysofweek.jpg)

![**Total docked bike rides of casual riders by months**](casual_graphics/dockedbikerides_casual_totalsbymonth.jpg)

![**Total classic bike rides of casual riders by months**](casual_graphics/classicbikerides_casual_totalsbymonth.jpg)

![**Total electric bike rides of casual riders by months**](casual_graphics/electricbikerides_casual_totalsbymonth.jpg)

This all makes sense as casual riders would want to spend more time on each ride due to the unlock cost of 1 dollar per ride they incur. Thus they may choose to ride lesser number of times but extend the duration of each ride. **But it is unclear why docked bikes have the highest average duration of rides compared to other bikes for casual riders.**

For docked bikes, trip revenues for casual riders tend to peak in June. Trip revenue for casual riders are second most profitable compared to other bikes (after electric bikes) assuming a $0.39 charge per minute.

![**Trip revenue of electric bike for members**](casual_graphics/triprevenuedockedbikes_casual.jpg)


## Solutions to business task

#### **Q1. Who should we target for membership marketing?**

#### Classic bikes

For frequent casual riders of classic bikes, we need to look at how much they will spend if they switch from casual riding plans to membership to determine whether membership subscription yields higher profits. 

We assume that casual riders will always go for the cheapest options available to them without membership. Let us examine some scenarios.

For every member who rides no more than 45 minutes per ride, annual revenue from this member is only the annual membership fee of 120 dollars. But for casual riders who rides no more than 45 minutes per ride, annual revenue per rider is at least 123 dollars if the number of rides each year is above 14. The calculation of annual revenues are as follows:

* Annual revenue (casual rider) = (45 (mins) x 0.16 (/min) + 1 (unlock cost)) x 15 (rides/year) = $123

* Annual revenue (member) = 10 (/mth) x 12 (mths) = $120

For every member who rides for 3 hours per ride, annual revenue from each member is around 8,000 dollars since rides above 45 minutes are charged at $0.16 per minute. But for casual riders who rides no more than 3 hours per ride, annual revenue per rider is 5,475 dollars as they will choose the day plan of 15 dollars each day as the cheaper option. The calculation of annual revenues are as follows:

* Annual revenue (casual rider) = 365 (days) x 15 (/day) = $5,475

* Annual revenue (member) = ((180-45) (mins) x 0.16 (/min) x 365 (days) + 120 (membership fee) = $8,004

For every member and rider who rides more than 3 hours per ride, the cost per ride after 3 hours is the same, so which plan is more profitable depends on the revenue for the first 3 hours. Thus it is similar to the scenario above. The calculation of annual revenues are as follows:

* Annual revenue (casual rider) = 365 (days) x 15 (/day) = $5,475

* Annual revenue (member) = ((180-45) (mins) x 0.16 (/min) x 365 (days) + 120 (membership fee) = $8,004

Thus, I propose the following recommendations to maximize revenue:

* **For casual riders who ride no more than 45 minutes per ride, only promote membership to them if they ride for less than 15 rides each year on classic bikes.**

* **For casual rider who ride more than 45 minutes per ride, always promote membership to them. Membership will always bring in more revenue regardless of the number of rides each year.**

#### Electric bikes and docked bikes

For frequent casual riders of electric bikes, we need to look at how much they will spend if they switch from casual riding plans to membership to determine whether membership subscription yields higher profits. Let us examine some scenarios.

For every member who rides 1 minute per ride and only once per day, annual revenue from this member is 178 dollars given the $0.16 per min charge. But for casual riders who ride the same frequency, annual revenue per rider is at least 507 dollars. The calculation of annual revenues are as follows:

* Annual revenue (casual rider) = (0.39 (/min) x 1 (min) + 1 (unlock cost)) x 365 (days) = $507 

* Annual revenue (member) = (0.16 (/min) x 1 (min) x 365 (days)) + 120 (membership fee) = $178

Membership only becomes more profitable under this revenue calculation:

* Annual revenue (member) = (0.35 (/min) x 1 (min) x 365 (days)) + 380 (membership fee) = $508


Similarly for scooters, revenue from casual riders are higher.

The calculation of annual revenues are as follows:

* Annual revenue (casual rider) = (0.39 (/min) x 1 (min) + 1 (unlock cost)) x 365 (days) = $507 

* Annual revenue (member) = (0.25 (/min) x 1 (min) x 365 (days)) + 120 (membership fee) = $211

Membership only becomes more profitable under this revenue calculation:

* Annual revenue (member) = (0.35 (/min) x1 (min) x 365 (days)) + 380 (membership fee) = $508

Thus, I propose the following recommendations to maximise revenue:

* **Raise membership fees to minimum 380 dollars per year or 32 dollars per month and cost per minute to $0.35 for all new members and promote membership to casual riders who ride electric bikes and scooters only once per day and 1 min per ride.** 

#### **Q2. How should we target casual riders based on their riding patterns?**

Since casual riders prefer to use all types of bikes for leisure from the preliminary data we have, we can promote membership subscription by months ($32 per month) during peak periods to maximise revenue for electric and docked bikes. 

For classic bikes, the original membership fee of 120 dollars per year can be promoted to casual riders who meet the criteria of riding above 45 minutes per ride.

#### **Q3. What other data do we need to gather or collect?**

To better access the profitability of our membership subscription, full day passes and better promote membership based on preferences of casual riders, we need to gather or collect the following information:

1) Survey on the kind of bikes casual riders prefer to use and the purpose (for work or leisure)
2) Current total revenue from membership subscriptions
3) Total revenue from full days passes
4) Number of casual riders who fit into the profile to promote membership to 
5) Information on why docked bikes have highest ride duration

____________________________________________________
# Appendix 

## Sources of data

1. Original trip data by divvybikes: https://divvy-tripdata.s3.amazonaws.com/index.html
2. Pricing information by divvybikes: https://divvybikes.com/

