Business Task

The primary objective of this project is to analyze usage patterns and behavior of Cyclistic’s casual riders and annual members. By gaining insights into their preferences and behaviors, we aim to design an effective marketing strategy to convert casual riders into annual members, ultimately optimizing the company’s revenue and enhancing user engagement.

About Data

The data was provided by Motivate International Inc.  The data were stored in different csv files which are in months or in quarters.

Load Packages

These packages are necessary for the SQL queries to run on R Markdown

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(DBI)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.3     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(bigrquery)
bq_auth(path = "C:\\Users\\ARTHUR\\OneDrive\\Documents\\my_service_account_key.json")

Establish a Connection

Establish a connection between the R Markdown document and BigQuery.

con <- dbConnect(
  bigrquery::bigquery(),
  project = "sharp-messenger-396703"
)

Data Cleaning

*After a thorough examinations of data, i decided to make use of the data from the year 2022. Since they were in separate csv files, i created a table which contains data from the 12 months of the year 2022 exculding rows with NULL values.

SELECT *
FROM `sharp-messenger-396703.Case_Study_a.CyclisticData`
LIMIT 10
Displaying records 1 - 10
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual
A80F66E86B532671 electric_bike 2022-08-12 10:45:23 2022-08-12 11:18:03 WestChi DIVVY 001 - Warehouse test station Normal Ave & Archer Ave TA1308000014 41.93242 -87.64276 41.84953 -87.64059 casual
75F38E47520D04F5 electric_bike 2022-08-02 16:21:10 2022-08-02 16:30:11 Smith Park 643 California Ave & North Ave 13258 41.89197 -87.68936 41.91048 -87.69689 casual
45100B63FDF5A9A6 classic_bike 2022-06-14 15:50:08 2022-06-14 16:02:13 Smith Park 643 California Ave & North Ave 13258 41.89205 -87.68940 41.91048 -87.69689 casual
EF3DB1AEE698FDCD electric_bike 2022-05-13 12:19:37 2022-05-13 12:44:20 Smith Park 643 Dearborn St & Van Buren St 624 41.89197 -87.68939 41.87627 -87.62915 member
27D5BD34C70AA7B8 electric_bike 2022-03-17 08:25:07 2022-03-17 08:52:29 Smith Park 643 Dearborn St & Van Buren St 624 41.89204 -87.68939 41.87627 -87.62915 member
89A0A14D02160AA1 classic_bike 2022-06-15 08:42:06 2022-06-15 09:00:51 Smith Park 643 Morgan St & Lake St* chargingstx4 41.89205 -87.68940 41.88549 -87.65229 member
57128DE1B3FB05AC electric_bike 2022-10-29 14:55:25 2022-10-29 15:12:10 Smith Park 643 Dearborn Pkwy & Delaware Pl TA1307000128 41.89205 -87.68941 41.89897 -87.62991 member
2B7ADC371863F621 electric_bike 2022-07-07 19:00:20 2022-07-07 19:12:20 Smith Park 643 California Ave & Francis Pl (Temp) 13259 41.89202 -87.68940 41.91849 -87.69742 casual
1AB5EE1B78B7C9D4 classic_bike 2022-07-23 11:23:03 2022-07-23 11:42:30 Smith Park 643 Orleans St & Hubbard St 636 41.89205 -87.68940 41.89003 -87.63662 member
C4A618BCA9EFF310 electric_bike 2022-07-17 15:28:16 2022-07-17 15:38:48 Smith Park 643 Humboldt Blvd & Armitage Ave 15651 41.89198 -87.68944 41.91751 -87.70181 casual

I created another table during data cleaning, I filtered and formatted the dataset to ensure that the analysis is based on relevant, consistent, and accurate data. Key steps in data cleaning include:: * Filtering based on ride_id length: Included rows where the length of ride_id is 16 characters.

SELECT *
FROM `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
LIMIT 10
Displaying records 1 - 10
ride_id rideable_type start_seconds end_seconds duration_seconds duration_datetime_with_hms start_day start_weekday_name start_month start_month_name end_day end_weekday_name end_month end_month_name year start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual user_count
C92ECEDA1951A573 electric_bike 14949 16013 1064 2022-01-01 00:17:44 1 Saturday 1 January 1 Saturday 1 January 2022 Maplewood Ave & 59th St 389 Western Blvd & 48th Pl 594 41.79 -87.69 41.81 -87.68 casual 1
6B93C46E8F5B114C classic_bike 134 1867 1733 2022-01-01 00:28:53 1 Saturday 1 January 1 Saturday 1 January 2022 Michigan Ave & 8th St 623 Michigan Ave & 8th St 623 41.87 -87.62 41.87 -87.62 casual 1
2EBB46B84B754BE3 electric_bike 41490 41718 228 2022-01-01 00:03:48 1 Saturday 1 January 1 Saturday 1 January 2022 State St & 79th St 573 Wabash Ave & 83rd St 587 41.75 -87.62 41.74 -87.62 member 1
1A16C29235F5D0F8 classic_bike 8814 8904 90 2022-01-01 00:01:30 1 Saturday 1 January 1 Saturday 1 January 2022 Clark St & Newport St 632 Clark St & Newport St 632 41.94 -87.65 41.94 -87.65 casual 1
5502838CDCC451EC electric_bike 3779 4288 509 2022-01-01 00:08:29 1 Saturday 1 January 1 Saturday 1 January 2022 Kedzie Ave & 48th Pl 382 St Louis Ave & 59th St 387 41.80 -87.70 41.79 -87.71 casual 1
EC550C7C56FC9353 classic_bike 36610 36867 257 2022-01-01 00:04:17 1 Saturday 1 January 1 Saturday 1 January 2022 Eastlake Ter & Rogers Ave 523 Greenview Ave & Jarvis Ave 520 42.02 -87.67 42.02 -87.67 member 1
ACF89B810C3C98A4 classic_bike 15251 15556 305 2022-01-01 00:05:05 1 Saturday 1 January 1 Saturday 1 January 2022 Ridge Blvd & Howard St 514 Paulina St & Howard St 515 42.02 -87.68 42.02 -87.67 member 1
094D45EE86D10805 electric_bike 24431 25393 962 2022-01-01 00:16:02 1 Saturday 1 January 1 Saturday 1 January 2022 Glenwood Ave & Touhy Ave 525 Benson Ave & Church St 596 42.01 -87.67 42.05 -87.68 member 1
AFA77CF187BF02AD classic_bike 44243 46291 2048 2022-01-01 00:34:08 1 Saturday 1 January 1 Saturday 1 January 2022 Greenview Ave & Jarvis Ave 520 University Library (NU) 605 42.02 -87.67 42.05 -87.67 member 1
6986904AD03646F1 classic_bike 38219 38433 214 2022-01-01 00:03:34 1 Saturday 1 January 1 Saturday 1 January 2022 Eastlake Ter & Rogers Ave 523 Greenview Ave & Jarvis Ave 520 42.02 -87.67 42.02 -87.67 member 1

Analyzing of Data

SELECT
  COUNT(DISTINCT start_station_name) AS num_of_station,
FROM
  `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
1 records
num_of_station
763

From the result of our query, we found out that the company has 763 stations. Recommendation: Cyclistic should focus on expanding the bike-sharing service by strategically placing new stations in areas with high demand and limited coverage.

SELECT
  COUNT(DISTINCT CONCAT(CAST(start_lat AS STRING), ',', CAST(start_lng AS STRING))) AS unique_locations_count
FROM
  `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
1 records
unique_locations_count
326

we discovered that the company has 326 unique locations in which the 763 stations are distributed. Recommendation: Cyclistic should target areas with a high number of unique locations to improve service accessibility and user convenience.

SELECT
  rideable_type,
  COUNT(*) AS num_of_bike
FROM
  `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
GROUP BY
  rideable_type
3 records
rideable_type num_of_bike
electric_bike 33010
classic_bike 20744
docked_bike 2329

The company has 3 types of bikes; Electric bike, Docked bike, and Classical bike. Recommendation: Cyclistic should invest in the most popular rideable types to meet user demand and potentially expand offerings for the preferred bike types.

SELECT 
  member_casual,
  rideable_type,
  COUNT(*) AS ride_count
FROM 
  `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned` 
GROUP BY
  member_casual, rideable_type 
ORDER BY
  rideable_type
5 records
member_casual rideable_type ride_count
casual classic_bike 8242
member classic_bike 12502
casual docked_bike 2329
casual electric_bike 18901
member electric_bike 14109

The above query result gives us insight in the number of users that make use of the bikes in the company. we dicovered that casual users own all the docked bikes and most of the electric bikes. Recommendation: Understanding user preferences for specific rideable types can help tailor marketing strategies to each user category, encouraging increased usage and potential membership conversion.

WITH RideDurations AS (
    SELECT
        ride_id,
        member_casual,
        EXTRACT(HOUR FROM TIMESTAMP_SECONDS(end_seconds)) * 60 + EXTRACT(MINUTE FROM TIMESTAMP_SECONDS(end_seconds)) - (EXTRACT(HOUR FROM TIMESTAMP_SECONDS(start_seconds)) * 60 + EXTRACT(MINUTE FROM TIMESTAMP_SECONDS(start_seconds))) AS ride_duration_minutes
    FROM
        Case_Study_a.CyclisticData_cleaned
),
DurationSummary AS (
    SELECT
        member_casual,
        COUNT(DISTINCT ride_id) AS total_rides,
        SUM(ride_duration_minutes) AS total_duration,
        ROUND(AVG(ride_duration_minutes), 2) AS average_duration
    FROM
        RideDurations
    GROUP BY
        member_casual
)
SELECT
    ds.member_casual,
    ds.total_rides,
    ds.total_duration,
    ds.average_duration
FROM
    DurationSummary ds
2 records
member_casual total_rides total_duration average_duration
casual 29472 659984 22.39
member 26611 310000 11.65

It was discovered that casual users has average duration of 22.39 minutes, while the member users has 11.65 minutes. Recommendation: Understanding ride durations for different user categories is crucial for designing pricing plans or promotions that encourage longer rides and increased engagement, potentially converting casual riders into annual members.

SELECT
  member_casual,
  start_lat,
  start_lng,
  start_station_name,  -- Include the station name for context
  end_lat,
  end_lng,
  end_station_name,    -- Include the end station name for context
  COUNT(*) AS ride_count
FROM
  `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
GROUP BY
  member_casual, start_lat, start_lng, start_station_name, end_lat, end_lng, end_station_name
ORDER BY
  ride_count DESC
Displaying records 1 - 10
member_casual start_lat start_lng start_station_name end_lat end_lng end_station_name ride_count
casual 41.87 -87.62 Michigan Ave & 8th St 41.87 -87.62 Michigan Ave & 8th St 1585
casual 41.97 -87.65 Lakefront Trail & Wilson Ave 41.97 -87.65 Lakefront Trail & Wilson Ave 913
casual 41.92 -87.64 Lincoln Park Conservatory 41.92 -87.64 Lincoln Park Conservatory 892
member 42.06 -87.68 Sheridan Rd & Noyes St (NU) 42.05 -87.68 Benson Ave & Church St 556
member 42.05 -87.68 Benson Ave & Church St 42.06 -87.68 Sheridan Rd & Noyes St (NU) 517
casual 42.06 -87.68 Sheridan Rd & Noyes St (NU) 42.06 -87.68 Sheridan Rd & Noyes St (NU) 491
casual 41.94 -87.65 Clark St & Newport St 41.94 -87.65 Clark St & Newport St 489
member 41.97 -87.65 Lakefront Trail & Wilson Ave 41.97 -87.65 Lakefront Trail & Wilson Ave 485
casual 42.06 -87.68 Sheridan Rd & Noyes St (NU) 42.05 -87.68 Benson Ave & Church St 446
member 41.91 -87.65 Halsted St & Clybourn Ave 41.91 -87.65 Halsted St & Clybourn Ave 444

Ride count tells us the most frequent routes with the station name and location where the station is located, also the ‘member_casual’ shows the user type that frequent that station the most. Recommendation: Cyclistic can optimize station locations and marketing strategies based on popular routes, enhancing user experience and encouraging more usage.

WITH HourlyRideCounts AS (
  SELECT
    EXTRACT(HOUR FROM TIMESTAMP_SECONDS(start_seconds)) AS hour_of_day,
    member_casual,
    COUNT(*) AS ride_count
  FROM
    `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
  GROUP BY
    hour_of_day, member_casual
),
MaxRideCountPerHour AS (
  SELECT
    hour_of_day,
    MAX(ride_count) AS max_ride_count
  FROM
    HourlyRideCounts
  GROUP BY
    hour_of_day
)
SELECT
  hrc.hour_of_day,
  hrc.member_casual,
  hrc.ride_count
FROM
  HourlyRideCounts hrc
JOIN
  MaxRideCountPerHour mrch ON hrc.hour_of_day = mrch.hour_of_day
WHERE
  hrc.ride_count = mrch.max_ride_count
ORDER BY
  hrc.hour_of_day
Displaying records 1 - 10
hour_of_day member_casual ride_count
0 casual 696
1 casual 478
2 casual 276
3 casual 224
4 casual 184
5 casual 243
6 member 542
7 member 1141
8 member 1471
9 member 1220

The result of the analysis shows that members make use of the bike mostly in hours of 6, 7, 8, 9, and 17, while casual users make use of bikes the most in the rest hours of the day. Recommendation: Cyclistic can optimize station locations and marketing strategies based on popular routes, enhancing user experience and encouraging more usage.

WITH HourlyAverages AS (
  SELECT
    member_casual,
    EXTRACT(HOUR FROM TIMESTAMP_SECONDS(start_seconds)) AS hour_of_day,
    start_lat,
    start_lng
  FROM
    `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
)
SELECT
  member_casual,
  hour_of_day,
  ROUND(AVG(start_lat), 2) AS avg_start_lat,
  ROUND(AVG(start_lng), 2) AS avg_start_lng
FROM
  HourlyAverages
GROUP BY
  member_casual, hour_of_day
ORDER BY
  hour_of_day
Displaying records 1 - 10
member_casual hour_of_day avg_start_lat avg_start_lng
casual 0 41.89 -87.68
member 0 41.92 -87.69
casual 1 41.90 -87.68
member 1 41.94 -87.68
casual 2 41.89 -87.68
member 2 41.92 -87.69
member 3 41.92 -87.68
casual 3 41.89 -87.70
casual 4 41.89 -87.69
member 4 41.95 -87.68

The result of the analysis uncovers the location with most users at each hour of the day and shows the user with most bikes in that particular hour and location. Recommendation: Understanding where users predominantly start their rides at different hours can help in station planning and targeted marketing efforts.

SELECT
  member_casual,
  SUM(CASE WHEN user_category = 'Weekend User' THEN 1 ELSE 0 END) AS weekend_users,
  SUM(CASE WHEN user_category = 'Weekday User' THEN 1 ELSE 0 END) AS weekday_users
FROM
  (
    SELECT
      member_casual,
      CASE
        WHEN start_weekday_name = 'Saturday' OR start_weekday_name = 'Sunday' THEN 'Weekend User'
        WHEN end_weekday_name = 'Saturday' OR end_weekday_name = 'Sunday' THEN 'Weekend User'
        ELSE 'Weekday User'
      END AS user_category
    FROM
      `sharp-messenger-396703.Case_Study_a.CyclisticData_cleaned`
  ) AS categorized_users
GROUP BY member_casual
2 records
member_casual weekend_users weekday_users
casual 9919 19553
member 6531 20080

The analysis shows that there are 9919 weekend users and 19553 weekday users who are casual users, while there are 6531 weekend users and 20080 weekday users who are members. Recommendation: Design targeted marketing campaigns for weekends and weekdays based on user preferences. Incentives or promotions for weekend users might encourage more casual riders to become members.

Conclusion and Overall Recommendations

In conclusion, analyzing the usage patterns and behaviors of casual riders and annual members provides valuable insights for designing an effective marketing strategy. Here are the overarching recommendations:

By implementing these recommendations, Cyclistic can strategically enhance its marketing approach and user engagement, ultimately increasing the conversion rate of casual riders to annual members.