Provide marketing strategies to convert casual riders into member riders
How do members riders and causual riders differ in their use of Cyclistic bikes?
Why do regular riders purchase Cyclistic’s annual membership?
How can Cyclistic utilize digital platforms to influence regular riders to become members?
Moreno assigned me the first question to answer: How do members riders and causual riders differ in their use of Cyclistic bikes?
I utilize Cyclistc’s historical data to conduct analysis and identify trends over the first 5 months of 2023
Data has been collected from divvy_tripdata. Provided by Motivate International under this license
Utilize data from 01/2023 to 05/2023
Install Packages to connect with SQL
install.packages("odbc")
install.packages("DBI")
install.packages("tidyverse")
In there:
Package “odbc” allow connect with Database of SQL Server
Package “DBI” allow interact with Database
Utilize libary of packages
library(odbc)
library(DBI)
library(ggplot2)
Connect to SQL Server
Combine data of 5 tables into single table
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'divvy_tripdata_2023')
BEGIN
SELECT * INTO divvy_tripdata_2023
FROM (
SELECT * FROM divvy_tripdata_01_2023
UNION ALL
SELECT * FROM divvy_tripdata_02_2023
UNION ALL
SELECT * FROM divvy_tripdata_03_2023
UNION ALL
SELECT * FROM divvy_tripdata_04_2023
UNION ALL
SELECT * FROM divvy_tripdata_05_2023 ) AS row_data
END
SELECT COUNT(*) AS count
FROM divvy_tripdata_2023
| count |
|---|
| 1670841 |
In this way, ‘divvy_tripdata_2023’ table has been created with 1,607,841 records
The following table shows all field name and data type of table
‘divvy_tripdata_2023’
The query bellow shows the number of NULL vaule per field
SELECT
COUNT(CASE WHEN ride_id IS NULL THEN 1 ELSE NULL END) AS ride_id,
COUNT(CASE WHEN rideable_type IS NULL THEN 1 ELSE NULL END) AS rideable_type,
COUNT(CASE WHEN started_at IS NULL THEN 1 ELSE NULL END) AS started_at,
COUNT(CASE WHEN ended_at IS NULL THEN 1 ELSE NULL END) AS ended_at,
COUNT(CASE WHEN start_station_name IS NULL THEN 1 ELSE NULL END) AS start_station_name,
COUNT(CASE WHEN start_station_id IS NULL THEN 1 ELSE NULL END) AS start_station_id,
COUNT(CASE WHEN end_station_name IS NULL THEN 1 ELSE NULL END) AS end_station_name,
COUNT(CASE WHEN end_station_id IS NULL THEN 1 ELSE NULL END) AS end_station_id,
COUNT(CASE WHEN start_lat IS NULL THEN 1 ELSE NULL END) AS start_lat,
COUNT(CASE WHEN start_lng IS NULL THEN 1 ELSE NULL END) AS start_lng,
COUNT(CASE WHEN end_lat IS NULL THEN 1 ELSE NULL END) AS end_lat,
COUNT(CASE WHEN end_lng IS NULL THEN 1 ELSE NULL END) AS end_lng,
COUNT(CASE WHEN member_casual IS NULL THEN 1 ELSE NULL END) AS member_casual
FROM divvy_tripdata_2023
| ride_id | rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 241158 | 241290 | 256913 | 257054 | 0 | 0 | 1571 | 1571 | 0 |
Observation:
Due to rise_id field is a Primary Key, there will be no duplicate values
The current format of the ‘divvy_tripdata_2023’ table is yyyy-mm-dd hh:mm:ss. To perform deeper analysis, I will create 4 new columns with respective value types: Hour, Day of Week, Month, and Year.
Create a new column named rise_length to determine the duration of each trip (in minute format).
All records containing NULL values will be removed
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'pre_clean_combined_data')
BEGIN
SELECT * INTO pre_clean_combined_data
FROM (
SELECT ride_id, rideable_type,
CAST(DATENAME(hh, started_at) AS INT) AS hour,
DATENAME(dw, started_at) AS day_of_week,
DAY(started_at) AS day,
MONTH(started_at) AS month,
YEAR(started_at) AS year,
-- Cast the data type of the DateDiff function result to the round result of the division to 2 decimal places
ROUND(CAST(DATEDIFF(s, started_at, ended_at) AS float)/60, 2) AS ride_length_minute,
start_station_name, end_station_name, start_lat, start_lng,
end_lat, end_lng, member_casual
FROM divvy_tripdata_2023
WHERE start_station_name IS NOT NULL AND
start_station_id IS NOT NULL AND
end_station_name IS NOT NULL AND
end_station_id IS NOT NULL AND
end_lat IS NOT NULL AND
end_lng IS NOT NULL
) AS cleaned
END
-- Sorting ride_length_minute column in ascending
SELECT TOP(10) * FROM pre_clean_combined_data
ORDER BY ride_length_minute ASC
| ride_id | rideable_type | hour | day_of_week | day | month | year | ride_length_minute | start_station_name | end_station_name | start_lat | start_lng | end_lat | end_lng | member_casual |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A769AB597DEA18C4 | classic_bike | 5 | Saturday | 27 | 5 | 2023 | -0.23 | Southport Ave & Belmont Ave | Southport Ave & Belmont Ave | 41.93948 | -87.66375 | 41.93948 | -87.66375 | member |
| 934174DB8E2AD791 | classic_bike | 17 | Monday | 29 | 5 | 2023 | -0.20 | Lakefront Trail & Bryn Mawr Ave | Lakefront Trail & Bryn Mawr Ave | 41.98404 | -87.65228 | 41.98404 | -87.65228 | member |
| 00AC4040E25E347E | classic_bike | 15 | Sunday | 7 | 5 | 2023 | -0.18 | Wentworth Ave & Cermak Rd* | Wentworth Ave & Cermak Rd* | 41.85321 | -87.63187 | 41.85321 | -87.63187 | casual |
| 97BF63D06721A3B9 | classic_bike | 18 | Saturday | 13 | 5 | 2023 | -0.10 | Wentworth Ave & Cermak Rd* | Wentworth Ave & Cermak Rd* | 41.85321 | -87.63187 | 41.85321 | -87.63187 | member |
| 0063C3704F56EC55 | electric_bike | 7 | Thursday | 27 | 4 | 2023 | -0.08 | Green St & Randolph St* | Green St & Randolph St* | 41.88346 | -87.64860 | 41.88360 | -87.64863 | casual |
| 81E1C5175FA5A23D | classic_bike | 14 | Wednesday | 19 | 4 | 2023 | -0.07 | Green St & Randolph St* | Green St & Randolph St* | 41.88360 | -87.64863 | 41.88360 | -87.64863 | member |
| 7A4D237E2C99D424 | electric_bike | 17 | Tuesday | 4 | 4 | 2023 | -0.05 | Green St & Randolph St* | Green St & Randolph St* | 41.88361 | -87.64856 | 41.88360 | -87.64863 | member |
| 579596DD4C7C7538 | classic_bike | 17 | Tuesday | 23 | 5 | 2023 | -0.05 | Wentworth Ave & Cermak Rd* | Wentworth Ave & Cermak Rd* | 41.85321 | -87.63187 | 41.85321 | -87.63187 | casual |
| 1CEB299B0AACB63F | electric_bike | 16 | Wednesday | 19 | 4 | 2023 | 0.00 | Desplaines St & Kinzie St | Desplaines St & Kinzie St | 41.88850 | -87.64436 | 41.88872 | -87.64445 | casual |
| BF7BE067A7FD077A | electric_bike | 7 | Tuesday | 18 | 4 | 2023 | 0.00 | California Ave & Division St | California Ave & Division St | 41.90303 | -87.69754 | 41.90303 | -87.69747 | member |
Observation:
The result-set in ride_length_minute column have 8 values with value < 0
The cause may be due to user error or system error when inserting data into the database
-- Remove records that *ride_length_minute* value < 0
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'clean_combined_data')
BEGIN
SELECT * INTO clean_combined_data
FROM (
SELECT * FROM pre_clean_combined_data
wHERE ride_length_minute >= 0 ) AS completely_clean
END
-- Count the number of records in *clean_combined_data*
SELECT COUNT(*) AS count
FROM clean_combined_data
| count |
|---|
| 1285707 |
In this way, after the data clean step, a new table named clean_combined_data will be obtained with 1,285,707 records
After identifying the differences between casual and member riders, marketing strategies to target casual riders can be developed to persuade them to become members. Recommendations:
Marketing campaigns might be conducted in spring and summer at tourist/recreational locations popular among casual riders.
Casual riders are most active on weekends and during the summer and spring, thus they may be offered seasonal or weekend-only memberships.
Casual riders use their bikes for longer durations than members. Offering discounts for longer rides may incentivize casual riders and entice members to ride for longer periods of time.