The primary objective of this project is to analyze usage patterns and behavior of Cyclistic’s casual riders and annual members. By gaining insights into their preferences and behaviors, we aim to design an effective marketing strategy to convert casual riders into annual members, ultimately optimizing the company’s revenue and enhancing user engagement.
The data was provided by Motivate International Inc. The data were stored in different csv files which are in months or in quarters.
These packages are necessary for the SQL queries to run on R Markdown
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(DBI)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.3 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(bigrquery)
bq_auth(path = "C:\\Users\\ARTHUR\\OneDrive\\Documents\\my_service_account_key.json")
Establish a connection between the R Markdown document and BigQuery.
con <- dbConnect(
bigrquery::bigquery(),
project = "sharp-messenger-396703"
)
*After a thorough examinations of data, i decided to make use of the data from the year 2022. Since they were in separate csv files, i created a table which contains data from the 12 months of the year 2022 exculding rows with NULL values.
SELECT *
FROM `sharp-messenger-396703.Case_Study_a.CyclisticData`
LIMIT 10
| ride_id | rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A80F66E86B532671 | electric_bike | 2022-08-12 10:45:23 | 2022-08-12 11:18:03 | WestChi | DIVVY 001 - Warehouse test station | Normal Ave & Archer Ave | TA1308000014 | 41.93242 | -87.64276 | 41.84953 | -87.64059 | casual |
| 75F38E47520D04F5 | electric_bike | 2022-08-02 16:21:10 | 2022-08-02 16:30:11 | Smith Park | 643 | California Ave & North Ave | 13258 | 41.89197 | -87.68936 | 41.91048 | -87.69689 | casual |
| 45100B63FDF5A9A6 | classic_bike | 2022-06-14 15:50:08 | 2022-06-14 16:02:13 | Smith Park | 643 | California Ave & North Ave | 13258 | 41.89205 | -87.68940 | 41.91048 | -87.69689 | casual |
| EF3DB1AEE698FDCD | electric_bike | 2022-05-13 12:19:37 | 2022-05-13 12:44:20 | Smith Park | 643 | Dearborn St & Van Buren St | 624 | 41.89197 | -87.68939 | 41.87627 | -87.62915 | member |
| 27D5BD34C70AA7B8 | electric_bike | 2022-03-17 08:25:07 | 2022-03-17 08:52:29 | Smith Park | 643 | Dearborn St & Van Buren St | 624 | 41.89204 | -87.68939 | 41.87627 | -87.62915 | member |
| 89A0A14D02160AA1 | classic_bike | 2022-06-15 08:42:06 | 2022-06-15 09:00:51 | Smith Park | 643 | Morgan St & Lake St* | chargingstx4 | 41.89205 | -87.68940 | 41.88549 | -87.65229 | member |
| 57128DE1B3FB05AC | electric_bike | 2022-10-29 14:55:25 | 2022-10-29 15:12:10 | Smith Park | 643 | Dearborn Pkwy & Delaware Pl | TA1307000128 | 41.89205 | -87.68941 | 41.89897 | -87.62991 | member |
| 2B7ADC371863F621 | electric_bike | 2022-07-07 19:00:20 | 2022-07-07 19:12:20 | Smith Park | 643 | California Ave & Francis Pl (Temp) | 13259 | 41.89202 | -87.68940 | 41.91849 | -87.69742 | casual |
| 1AB5EE1B78B7C9D4 | classic_bike | 2022-07-23 11:23:03 | 2022-07-23 11:42:30 | Smith Park | 643 | Orleans St & Hubbard St | 636 | 41.89205 | -87.68940 | 41.89003 | -87.63662 | member |
| C4A618BCA9EFF310 | electric_bike | 2022-07-17 15:28:16 | 2022-07-17 15:38:48 | Smith Park | 643 | Humboldt Blvd & Armitage Ave | 15651 | 41.89198 | -87.68944 | 41.91751 | -87.70181 | casual |
I created another table during data cleaning, I filtered and formatted the dataset to ensure that the analysis is based on relevant, consistent, and accurate data. Key steps in data cleaning include:: * Filtering based on ride_id length: Included rows where the length of ride_id is 16 characters.
Filtering based on rideable_type: Included rows where rideable_type is one of ‘electric_bike’, ‘docked_bike’, or ‘classic_bike’.
Formatting and filtering timestamps (started_at and ended_at): Included rows where started_at and ended_at follow the format ‘%Y-%m-%d %H:%M:%S %Z’ using FORMAT_TIMESTAMP function.
Filtering start_station_name: Included rows where start_station_name does not contain only numeric values.
Filtering based on start_station_id and end_station_id length: Included rows where the length of start_station_id and end_station_id is 3.
Filtering end_station_name: Included rows where end_station_name does not contain only numeric values.
Filtering based on member_casual: Included rows where member_casual is either ‘member’ or ‘casual’.
Round latitudes and longitudes: Rounded start_lat, start_lng, end_lat, and end_lng columns to 2 decimal places.
Filtering start_time and end_time: Filtered out rows where start_time is greater than end_time.
SELECT *
FROM `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
LIMIT 10
| ride_id | rideable_type | start_seconds | end_seconds | duration_seconds | duration_datetime_with_hms | start_day | start_weekday_name | start_month | start_month_name | end_day | end_weekday_name | end_month | end_month_name | year | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | user_count |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C92ECEDA1951A573 | electric_bike | 14949 | 16013 | 1064 | 2022-01-01 00:17:44 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Maplewood Ave & 59th St | 389 | Western Blvd & 48th Pl | 594 | 41.79 | -87.69 | 41.81 | -87.68 | casual | 1 |
| 6B93C46E8F5B114C | classic_bike | 134 | 1867 | 1733 | 2022-01-01 00:28:53 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Michigan Ave & 8th St | 623 | Michigan Ave & 8th St | 623 | 41.87 | -87.62 | 41.87 | -87.62 | casual | 1 |
| 2EBB46B84B754BE3 | electric_bike | 41490 | 41718 | 228 | 2022-01-01 00:03:48 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | State St & 79th St | 573 | Wabash Ave & 83rd St | 587 | 41.75 | -87.62 | 41.74 | -87.62 | member | 1 |
| 1A16C29235F5D0F8 | classic_bike | 8814 | 8904 | 90 | 2022-01-01 00:01:30 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Clark St & Newport St | 632 | Clark St & Newport St | 632 | 41.94 | -87.65 | 41.94 | -87.65 | casual | 1 |
| 5502838CDCC451EC | electric_bike | 3779 | 4288 | 509 | 2022-01-01 00:08:29 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Kedzie Ave & 48th Pl | 382 | St Louis Ave & 59th St | 387 | 41.80 | -87.70 | 41.79 | -87.71 | casual | 1 |
| EC550C7C56FC9353 | classic_bike | 36610 | 36867 | 257 | 2022-01-01 00:04:17 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Eastlake Ter & Rogers Ave | 523 | Greenview Ave & Jarvis Ave | 520 | 42.02 | -87.67 | 42.02 | -87.67 | member | 1 |
| ACF89B810C3C98A4 | classic_bike | 15251 | 15556 | 305 | 2022-01-01 00:05:05 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Ridge Blvd & Howard St | 514 | Paulina St & Howard St | 515 | 42.02 | -87.68 | 42.02 | -87.67 | member | 1 |
| 094D45EE86D10805 | electric_bike | 24431 | 25393 | 962 | 2022-01-01 00:16:02 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Glenwood Ave & Touhy Ave | 525 | Benson Ave & Church St | 596 | 42.01 | -87.67 | 42.05 | -87.68 | member | 1 |
| AFA77CF187BF02AD | classic_bike | 44243 | 46291 | 2048 | 2022-01-01 00:34:08 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Greenview Ave & Jarvis Ave | 520 | University Library (NU) | 605 | 42.02 | -87.67 | 42.05 | -87.67 | member | 1 |
| 6986904AD03646F1 | classic_bike | 38219 | 38433 | 214 | 2022-01-01 00:03:34 | 1 | Saturday | 1 | January | 1 | Saturday | 1 | January | 2022 | Eastlake Ter & Rogers Ave | 523 | Greenview Ave & Jarvis Ave | 520 | 42.02 | -87.67 | 42.02 | -87.67 | member | 1 |
SELECT
COUNT(DISTINCT start_station_name) AS num_of_station,
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
| num_of_station |
|---|
| 763 |
From the result of our query, we found out that the company has 763 stations. Recommendation: Cyclistic should focus on expanding the bike-sharing service by strategically placing new stations in areas with high demand and limited coverage.
SELECT
COUNT(DISTINCT CONCAT(CAST(start_lat AS STRING), ',', CAST(start_lng AS STRING))) AS unique_locations_count
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
| unique_locations_count |
|---|
| 326 |
we discovered that the company has 326 unique locations in which the 763 stations are distributed. Recommendation: Cyclistic should target areas with a high number of unique locations to improve service accessibility and user convenience.
SELECT
rideable_type,
COUNT(*) AS num_of_bike
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
GROUP BY
rideable_type
| rideable_type | num_of_bike |
|---|---|
| electric_bike | 33010 |
| classic_bike | 20744 |
| docked_bike | 2329 |
The company has 3 types of bikes; Electric bike, Docked bike, and Classical bike. Recommendation: Cyclistic should invest in the most popular rideable types to meet user demand and potentially expand offerings for the preferred bike types.
SELECT
member_casual,
rideable_type,
COUNT(*) AS ride_count
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
GROUP BY
member_casual, rideable_type
ORDER BY
rideable_type
| member_casual | rideable_type | ride_count |
|---|---|---|
| casual | classic_bike | 8242 |
| member | classic_bike | 12502 |
| casual | docked_bike | 2329 |
| casual | electric_bike | 18901 |
| member | electric_bike | 14109 |
The above query result gives us insight in the number of users that make use of the bikes in the company. we dicovered that casual users own all the docked bikes and most of the electric bikes. Recommendation: Understanding user preferences for specific rideable types can help tailor marketing strategies to each user category, encouraging increased usage and potential membership conversion.
WITH RideDurations AS (
SELECT
ride_id,
member_casual,
EXTRACT(HOUR FROM TIMESTAMP_SECONDS(end_seconds)) * 60 + EXTRACT(MINUTE FROM TIMESTAMP_SECONDS(end_seconds)) - (EXTRACT(HOUR FROM TIMESTAMP_SECONDS(start_seconds)) * 60 + EXTRACT(MINUTE FROM TIMESTAMP_SECONDS(start_seconds))) AS ride_duration_minutes
FROM
Case_Study_a.CyclisticData_cleaned
),
DurationSummary AS (
SELECT
member_casual,
COUNT(DISTINCT ride_id) AS total_rides,
SUM(ride_duration_minutes) AS total_duration,
ROUND(AVG(ride_duration_minutes), 2) AS average_duration
FROM
RideDurations
GROUP BY
member_casual
)
SELECT
ds.member_casual,
ds.total_rides,
ds.total_duration,
ds.average_duration
FROM
DurationSummary ds
| member_casual | total_rides | total_duration | average_duration |
|---|---|---|---|
| casual | 29472 | 659984 | 22.39 |
| member | 26611 | 310000 | 11.65 |
It was discovered that casual users has average duration of 22.39 minutes, while the member users has 11.65 minutes. Recommendation: Understanding ride durations for different user categories is crucial for designing pricing plans or promotions that encourage longer rides and increased engagement, potentially converting casual riders into annual members.
SELECT
member_casual,
start_lat,
start_lng,
start_station_name, -- Include the station name for context
end_lat,
end_lng,
end_station_name, -- Include the end station name for context
COUNT(*) AS ride_count
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
GROUP BY
member_casual, start_lat, start_lng, start_station_name, end_lat, end_lng, end_station_name
ORDER BY
ride_count DESC
| member_casual | start_lat | start_lng | start_station_name | end_lat | end_lng | end_station_name | ride_count |
|---|---|---|---|---|---|---|---|
| casual | 41.87 | -87.62 | Michigan Ave & 8th St | 41.87 | -87.62 | Michigan Ave & 8th St | 1585 |
| casual | 41.97 | -87.65 | Lakefront Trail & Wilson Ave | 41.97 | -87.65 | Lakefront Trail & Wilson Ave | 913 |
| casual | 41.92 | -87.64 | Lincoln Park Conservatory | 41.92 | -87.64 | Lincoln Park Conservatory | 892 |
| member | 42.06 | -87.68 | Sheridan Rd & Noyes St (NU) | 42.05 | -87.68 | Benson Ave & Church St | 556 |
| member | 42.05 | -87.68 | Benson Ave & Church St | 42.06 | -87.68 | Sheridan Rd & Noyes St (NU) | 517 |
| casual | 42.06 | -87.68 | Sheridan Rd & Noyes St (NU) | 42.06 | -87.68 | Sheridan Rd & Noyes St (NU) | 491 |
| casual | 41.94 | -87.65 | Clark St & Newport St | 41.94 | -87.65 | Clark St & Newport St | 489 |
| member | 41.97 | -87.65 | Lakefront Trail & Wilson Ave | 41.97 | -87.65 | Lakefront Trail & Wilson Ave | 485 |
| casual | 42.06 | -87.68 | Sheridan Rd & Noyes St (NU) | 42.05 | -87.68 | Benson Ave & Church St | 446 |
| member | 41.91 | -87.65 | Halsted St & Clybourn Ave | 41.91 | -87.65 | Halsted St & Clybourn Ave | 444 |
Ride count tells us the most frequent routes with the station name and location where the station is located, also the ‘member_casual’ shows the user type that frequent that station the most. Recommendation: Cyclistic can optimize station locations and marketing strategies based on popular routes, enhancing user experience and encouraging more usage.
WITH HourlyRideCounts AS (
SELECT
EXTRACT(HOUR FROM TIMESTAMP_SECONDS(start_seconds)) AS hour_of_day,
member_casual,
COUNT(*) AS ride_count
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
GROUP BY
hour_of_day, member_casual
),
MaxRideCountPerHour AS (
SELECT
hour_of_day,
MAX(ride_count) AS max_ride_count
FROM
HourlyRideCounts
GROUP BY
hour_of_day
)
SELECT
hrc.hour_of_day,
hrc.member_casual,
hrc.ride_count
FROM
HourlyRideCounts hrc
JOIN
MaxRideCountPerHour mrch ON hrc.hour_of_day = mrch.hour_of_day
WHERE
hrc.ride_count = mrch.max_ride_count
ORDER BY
hrc.hour_of_day
| hour_of_day | member_casual | ride_count |
|---|---|---|
| 0 | casual | 696 |
| 1 | casual | 478 |
| 2 | casual | 276 |
| 3 | casual | 224 |
| 4 | casual | 184 |
| 5 | casual | 243 |
| 6 | member | 542 |
| 7 | member | 1141 |
| 8 | member | 1471 |
| 9 | member | 1220 |
The result of the analysis shows that members make use of the bike mostly in hours of 6, 7, 8, 9, and 17, while casual users make use of bikes the most in the rest hours of the day. Recommendation: Cyclistic can optimize station locations and marketing strategies based on popular routes, enhancing user experience and encouraging more usage.
WITH HourlyAverages AS (
SELECT
member_casual,
EXTRACT(HOUR FROM TIMESTAMP_SECONDS(start_seconds)) AS hour_of_day,
start_lat,
start_lng
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
)
SELECT
member_casual,
hour_of_day,
ROUND(AVG(start_lat), 2) AS avg_start_lat,
ROUND(AVG(start_lng), 2) AS avg_start_lng
FROM
HourlyAverages
GROUP BY
member_casual, hour_of_day
ORDER BY
hour_of_day
| member_casual | hour_of_day | avg_start_lat | avg_start_lng |
|---|---|---|---|
| casual | 0 | 41.89 | -87.68 |
| member | 0 | 41.92 | -87.69 |
| casual | 1 | 41.90 | -87.68 |
| member | 1 | 41.94 | -87.68 |
| casual | 2 | 41.89 | -87.68 |
| member | 2 | 41.92 | -87.69 |
| member | 3 | 41.92 | -87.68 |
| casual | 3 | 41.89 | -87.70 |
| casual | 4 | 41.89 | -87.69 |
| member | 4 | 41.95 | -87.68 |
The result of the analysis uncovers the location with most users at each hour of the day and shows the user with most bikes in that particular hour and location. Recommendation: Understanding where users predominantly start their rides at different hours can help in station planning and targeted marketing efforts.
SELECT
member_casual,
SUM(CASE WHEN user_category = 'Weekend User' THEN 1 ELSE 0 END) AS weekend_users,
SUM(CASE WHEN user_category = 'Weekday User' THEN 1 ELSE 0 END) AS weekday_users
FROM
(
SELECT
member_casual,
CASE
WHEN start_weekday_name = 'Saturday' OR start_weekday_name = 'Sunday' THEN 'Weekend User'
WHEN end_weekday_name = 'Saturday' OR end_weekday_name = 'Sunday' THEN 'Weekend User'
ELSE 'Weekday User'
END AS user_category
FROM
`sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
) AS categorized_users
GROUP BY member_casual
| member_casual | weekend_users | weekday_users |
|---|---|---|
| casual | 9919 | 19553 |
| member | 6531 | 20080 |
The analysis shows that there are 9919 weekend users and 19553 weekday users who are casual users, while there are 6531 weekend users and 20080 weekday users who are members. Recommendation: Design targeted marketing campaigns for weekends and weekdays based on user preferences. Incentives or promotions for weekend users might encourage more casual riders to become members.
In conclusion, analyzing the usage patterns and behaviors of casual riders and annual members provides valuable insights for designing an effective marketing strategy. Here are the overarching recommendations:
Optimized Bike Allocation: Utilize insights from peak ride hours and popular routes to allocate bikes efficiently, ensuring availability during high-demand periods.
Targeted Promotions: Tailor marketing campaigns, discounts, and promotions based on user categories, peak hours, and popular locations to attract more riders and convert casual users into annual members.
Improved Station Placement: Use information on popular starting locations to plan and optimize station placements, ensuring easy access and increased utilization.
Enhanced User Experience: Focus on improving the user experience, such as ride durations and bike availability, to retain existing members and attract more casual users to convert.
By implementing these recommendations, Cyclistic can strategically enhance its marketing approach and user engagement, ultimately increasing the conversion rate of casual riders to annual members.